Understanding the communication–the movement of data, control signals, and synchronization primitives–within applications and hardware is key to achieving performance and efficiency. Through communication classification, one can measure the amount and type of data that moves within a workload. This classification is not just counting the amount of data moved and computed, but also measuring whether the communication is unique or non-unique—the first time data has been seen or reuse of that data. Furthermore, the contextual locality of communication is just as important. That is, whether data has been referenced within an entity(path, function, or thread), or between entities. Mark Hempstead and collaborators have shown that this classification can be used for a diverse array of co-design research, including hardware accelerator design, thread mapping, hardware simulation, and design space exploration. Communication classification is paramount to understanding design tradeoffs in software and hardware.
Current workload analysis methods are often embedded within a specific instrumentation tool (e.g. PIN,LLVM, Valgrind). Thus, when developers want to target a new hardware platform or use a new instrumentation front-end, the workload analysis must be rewritten. PRISM, a framework which enables workload analysis tools that are both developed and executed agnostic of the underlying hardware platform. The benefits of such a framework are that researchers can capture salient workload data, without domain specific knowledge of each hardware platform, and then reuse analysis tools across applications that use different hardware configurations. That is, PRISM tools are created once and valid across multiple platforms. There are four primary features of PRISM:
- Efficient, multi-platform workload characterization and trace generation supporting modern general-purpose, many-core, and heterogeneous architectures
- An architecture-agnostic representation that captures multi-threaded synchronization dependencies and heterogeneous behaviors, both CDFG and event-trace representations are supported
- A flexible representation capable of capturing both high-level workload communication and also low-level communication and computation patterns
- A unified developer interface to create workload analysis tools agnostic of the underlying platform.
PRISM achieves these features by decoupling the underlying workload proling techniques from the creation of workload analysis tools. PRISM tools provide a single interface to an architecture-agnostic event stream representing the workload. Notably, this event-streaming architecture is more accessible and amenable to characterization studies than the gamut of state-of-the-art proling techniques. PRISM does not require users to become experts in instrumentation, compilers, or machine-specific hardware trace features. Moreover, architecture-agnostic events are more representative of the intrinsic communication within a workload and preferred for communication classification. As demonstrated in Figure 1, PRISM abstracts the underlying hardware platform and provides users with customizable workload characteristics at the desired granularity. Granularity is conconfigurable by the user to suit the study at hand. This feature can substantially speedup workload studies by zooming out during non-salient regions and zooming in during regions of interest.
Figure 1: PRISM Multi-platform Characterization Framework
Internally, PRISM leverages the appropriate profiling methodologies, or front-ends, to generate the event stream for the given hardware platform with a modular infrastructure.
- Michael Lui, Karthik Sangaiagh, Mark Hemsptead, Baris Taskin. Towards Cross-Framework Workload Analysis via Flexible Event-Driven Interfaces. The International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2018. [PDF]