Workload Characterization and Hardware-Software Partitioning for Many-Accelerator Architectures
Sigil is a binary instrumentation based tool that assists Hardware/Software partitioning by automatically capturing communication annotated callgraphs for applications. As far as we know, no freely available tool exists for this purpose.
Sigil can capture and classify hardware-independent communication in an application. Communication classification distinguishes the true input/output/locally-produced data in bytes, of a thread or function from the repeated reads or writes of the same data. Such information can be used for many purposes. The most important use we envision is hardware/software partitioning, i.e. to determine which functions would benefit from custom hardware.
Task graphs have been used for hardware/software partitioning in hardware/software co-design research. A big challenge in this field is the lack of tools to automatically generate task graphs from applications; task graphs are usually constructed from prior knowledge of an application. While a task can be defined in many ways, software functions have been seen to well approximate modular tasks. Threads have also been seen as tasks when using a pipelined execution model in multi-threaded programs.Envisioning functions/threads as tasks to perform partitioning is a challenge as data movement is a fundamental necessity to build task graphs and that is not reflected in a call-tree. We attempted to solve this problem with the Sigil tool.
Sigil provides the raw task graphs for hardware/software partitioning algorithms in the form of control data flow graphs, where functions are the nodes. Figure 1 shows a call-tree for an example program. To such a call-tree, Sigil adds annotations that represents aggregate of unique (true input/output) data bytes communicated between the functions, as in Figure 2. The resulting call-tree is now a control data flow graph, with functions as nodes. Using functions as nodes in a task graph comes with unique considerations. For example, when a function is called from different contexts, we generate a new node for the function and keep separate cost accounting, so as to hold the requirement of only a single incoming control edge for every node. This is important for the partitioning problem that has to take control edges also into account. The tool also comes with a post-processing script that implements one simple demonstrative algorithm for partitioning on function-based control data flow graphs. The analysis and results from using this algorithm are presented in the publication and presentation.
Sigil can also generate a dependency tree between function calls, when the collection of events are enabled. Figure 4 shows an example dependency tree where A, B and C are individual function calls with costs in terms of floating point and integer operations listed. The edges are weighted by bytes of communication. This data can be used to find critical paths in the code and can potentially be used to discover function-level parallelism in serial programs. Please see the publication for analysis of results based on using this mode on popular benchmark suites. This mode should already work, but we are still testing this functionality on several benchmarks. If you choose to use this mode anyway and encounter any issues using this mode, please get in touch. Note: This mode is more memory and disk intensive. Please see the README_USERS distributed with the source code to understand how to run this mode.
Sigil is implemented as a run-time profiler and works on the application binary directly to produce platform-independent data. It does not need any source code changes or any prior knowledge of the application. Platform-independent data is not tied down to any specific architecture, but is a property of the application itself. As a result, data generated on any architecture any number of times should result in the same output data. It has reasonably low overhead given its goals.
- Automatically generates control data flow graph data for applications (for HW/SW partitioning)
- No source code changes or knowledge of the source code is required
- Can capture reuse information along with communication if enabled
- Can generate function-call dependency trees for critical path analysis if enabled
- Post-processing script to present data in human readable format
- Demonstrative partitioning algorithm in post-processing
The Sigil source code is accessible through the following GitHub repository: https://github.com/snilakan/Sigil
Requirements to Run Sigil
Since Sigil has been integrated with the Callgrind tool, it should theoretically run on any system that Valgrind and Callgrind support. You can find a list of supported platforms here: http://valgrind.org/info/platforms.html. Sigil maintains heavy data structures and has periodic long lookups, so it can eventually be limited by memory and slowdown for large applications. If those limits are encountered, we encourage using smaller data sets for those same applications.
Sigil has been tested on Intel Xeon E5-based machines, running either RedHat Enterprise Linux 5, Centos 6, or Ubuntu 12.x operating systems. We ran the SPEC2006 and PARSEC suite (serial version) of benchmarks under Sigil and analyzed the data. SPEC2006 and PARSEC suite results coming very soon…..
Building and Installing Sigil
The tool can be compiled by following the instructions to make/install Valgrind and its suite of tools. Check out the source code and follow the instructions in the README_USERS. A subset of those instructions are repeated here.
Navigate into the directory for Valgrind (the “valgrind-3.7” folder) and follow the README. The current version of Sigil is built into valgrind-3.7 and needs no separate build process. When the Callgrind tool is built, Sigil is built automatically.
To summarize the Valgrind build documentation on Linux, it simply involves the usual ‘make’ and ‘make install’ in the Valgrind directory. If you prefer not to ‘make install’ Valgrind, you can build Valgrind using just the ‘make’ and run it in place using Valgrind’s in-place script. See the “README_DEVELOPERS” document in the Valgrind folder for further clarification. We recommend running all the sanity checks that come pre-packaged with Callgrind.
After building Valgrind, navigate to the Callgrind folder and run ‘make check’. This will build all the regression tests/sanity checks for Callgrind and Sigil. If any of these sanity checks fail, Sigil cannot run on that system.
If Callgrind checks fail, please see the Valgrind notes/documentation and if necessary, lookup/mail to the Valgrind mailing list. If the Callgrind checks complete, but Sigil checks fail, please contact us.
For further details on how to run the tool and general clarifications please consult the README_USERS that is distributed with the Sigil source code.
We offered Tutorials on Sigil and SynchroTrace at ICCD and IISWC 2015. Please take a look at the Tutorial page for materials.
An earlier tutorial on Sigil was offered at HPCA 2015, in the tutorial for Research Infrastructures for Accelerator-Centric Architectures
- Giordano Salvador, Siddharth Nilakantan, Ankit More, Baris Taskin, M. Hempstead, Static Thread Mapping for NoC CMPs via Binary Instrumentation Traces. 32nd IEEE International Conference on Computer Design 2014 (ICCD), Oct. 2014.[PDF]
- Siddharth Nilakantan and Mark Hempstead, Platform-independent Analysis of Function-level Communication in Workloads, IEEE International Symposium on Workload Characterization (IISWC), Portland, OR Sep 2013.