SynchroTrace

Architecture Agnostic Multicore Tracing and Simulation

SynchroTrace is a two-step trace-driven simulation methodology that enables efficient design space exploration of CMPs. The first step, capturing synchronization-aware traces of multi-threaded applications, leverages an extension of prior work (Sigil). The second step represents a timing model for replaying the synchronization-aware traces into external architecture models. Together, these two stages represent ‘SynchroTrace’.

To leverage this methodology for design space exploration, we have developed a prototype of SynchroTrace integrated into the cache and NoC simulators of Gem5 (Ruby and Garnet, respectively). We defined this prototype integration as the SynchroTrace Simulation Framework, and the code for this framework is available below.

Tutorials:

We offered Tutorials on Sigil and SynchroTrace at ICCD and IISWC  2015. Please take a look at the Tutorial page for materials.

Overview:

The capture tool is built from Sigil, which leverages the Valgrind dynamic binary instrumentation tool. The processed instructions from the native multi-threaded applications are abstracted into (3) events: Computation (Work performed local to a thread), Communication (Read/Write dependencies between threads), and Synchronization (embedded pthread calls for each thread). These events form a trace for each individual thread, so that these threads may progress in parallel when replaying the traces.

Replay Model:

Traces are fed into the replay timing model, which acts as an interface into the external architecture models.

The Replay portion of SynchroTrace is comprised of 4 entities:

Trace Translator – Converts the traces into an event form to be fed into the timing model.

Event Queue Manager – Centralized event queue that manages the timing of thread progression based on the three types of events. The Event Queue Manager also handles the timing for when to send memory requests to the external cache simulator.

Thread Scheduler – Creates and maintains the thread state. The Thread Scheduler includes a light-weight swapping mechanism to allow for multiple threads to run on a core. The scheduler also handles the appropriate synchronization actions.

Memory Request Manager – Interface to the external architecture models. For the SynchroTrace Simulation Framework, the memory request manager packages the memory requests into requests for Ruby.

Getting SynchroTrace:

The SynchroTrace Simulation Framework is accessible through the VANDAL GitHub.

Currently, there is only a repository for playing the synchronization-aware traces into the external cache and NoC models (Ruby and Garnet). We’ve included a few sample traces to test and explore this code-base. We are currently prepping the capture tool for release very soon.

The SynchroTrace publication can be found here.

Dependencies:

The SynchroTrace Simulation Framework is integrated into Gem5’s cache and NoC simulators (Ruby and Garnet). Thus, SynchroTrace’s dependencies are based on Gem5’s dependencies.  Based on the version of Gem5 we leveraged, the following dependencies are necessary prior to compiling SynchroTrace:

  1. gcc-4.4.7
  2. gmp-5.1.1
  3. mpc-1.0
  4. mpfr-3.1.2
  5. swig-2.0.1
  6. python-2.7.6
  7. scons-2.3.0

Please refer to http://gem5.org/Dependencies for more information.

SynchroTrace has been tested on Intel Xeon E5-based machines, running either RedHat Enterprise Linux 5, Centos 6, or Ubuntu 12.x operating systems. We have generated traces for PARSEC and Splash-2 benchmarks (up to 64 threads reliably) and ran them through our SynchroTrace Simulation Framework.

Running the first time:

Please follow the included Readme to compile SynchroTrace and run the sample traces for the first time.

As this is a collaboration between Tufts and Drexel. More detail can be found on the VLSI Lab’s SynchroTrace Page.

Publications:

  • K. Sangaiah, M. Lui, R. Jagtap, S. Diestelhorst, S. Nilakantan, A. More, B. Taskin, and M. Hempstead. SynchroTrace: Synchronization-aware Architecture-agnostic Traces for Light-Weight Multicore Simulation of CMP and HPC Workloads. ACM Transactions on Architecture and Code Optimization (TACO), , Vol. 15, No. 1, Article 2, March 2018. [PDF]
  • K. Sangaiah, B. Taskin, and M. Hempstead, “Fast Multicore Simulation and Performance Analysis of HPC Applications with SynchroTrace”, Boston Area Architecture (BARC) Workshop, January 2016.
  • Karthik Sangaiah, Mark Hempstead, Baris Taskin, Uncore RPD: Rapid Design Space Exploration of the Uncore via Regression Modeling. Accepted for publication in International Conference On Computer Aided Design (ICCAD). Oct 2015.
  • Siddharth Nilakantan, Karthik Sangaiah, Ankit More, Giordano Salvador, Baris Taskin, Mark Hempstead, SynchroTrace: Synchronization-aware Architecture-agnostic Traces for Light-Weight Multicore Simulation. To Appear in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2015. [PDF]
  • Giordano Salvador, Siddharth Nilakantan, Ankit More, Baris Taskin, M. Hempstead, Effects of Non-determinism in Hardware and Software Simulation with Thread Mapping. 28th International Conference on VLSI Design and 14th International Conference on Embedded System Design 2015 (VLSID ES), Jan. 2015.
  • Siddharth Nilakantan, Scott Lerner, M. Hempstead, Baris Taskin, Can you trust your memory trace?: A comparison of memory traces from binary instrumentation and
    simulation. 28th International Conference on VLSI Design and 14th International Conference on Embedded System Design 2015 (VLSID ES), Jan. 2015.
  • Giordano Salvador, Siddharth Nilakantan, Ankit More, Baris Taskin, M. Hempstead, Static Thread Mapping for NoC CMPs via Binary Instrumentation Traces. 32nd IEEE International Conference on Computer Design 2014 (ICCD), Oct. 2014.