Building Reconfigurable Shared Accelerators through Early-stage Automated Identification of Similar Hardware Implementations with Abstract Syntax Trees.

Parnian Mokri, Tufts University
Maziar Amiraski, Tufts University
Yuelin Liu, Tufts University
Mark Hempstead, Tufts University

Poster Presentation. In Proceedings of the 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, (FPGA), February 2020.

[PDF]

Abstract

The resource requirements of application-specific accelerators challenge embedded system designers who have a tight area budget but must cover a range of possible software kernels. We propose an early detection methodology (ReconfAST) to identify computationally similar synthesizable kernels to build Shared Accelerators (SAs). SAs are specialized hardware accelerators that execute very different software kernels but share the common hardware functions between them.  SAs increase the fraction of workloads covered by specialized hardware by detecting similarities in dataflow and control flow between seemingly very different workloads. Existing methods use either dynamic traces or analyze register transfer level (RTL) implementations to find these similarities which require deep knowledge of RTL and time-consuming design process.

ReconfAST leverages abstract-syntax-trees (ASTs) generated from LLVM’s-clang to discover similar kernels among workloads. ASTs provide the right level of abstraction to detect commonalities. ASTs are compact, unlike control and dataflow representations, but contain extra syntax and variable node ordering that complicates workload comparison. ReconfAST, transforms ASTs into a new clustered-ASTs (CASTs) representation, removes unneeded nodes, and uses a regular expression to match common node configurations. The approach is validated using MachSuite accelerator benchmarks.

On FPGAs, a good Shared Accelerator accelerates workloads by an average of 5x and reduces the resources required for FPGA implementations: 37% FFs, 16% DSPs, and 10% on LUTs on average over a dedicated accelerator implementation.