Shared Accelerator Store: A System in a True Heterogeneous Architecture and Accelerators

Overview

Designers are turning toward hardware specialization through the use of application-specific accelerators to provide energy-efficiency and performance. The resource requirements (area or FPGA resources) of these accelerators challenge embedded system designers, in important domains such as medical devices, who have a tight area budget but must cover a range of possible software kernels. These challenges resulted in designers heavily rely on application-specific hardware which in turn resulted in the development of heterogeneous architectures.

There have been many tools and methodologies to design accelerators and their performance in systems. However, most of these designs don’t consider the whole system and only look at the performance of the accelerator as a standalone. Recent work in the literature shows that
data movement and coherence management for accelerators are significant yet often unaccounted
components of total accelerator runtime.

We will use the oneAPI programming model to distribute work across diverse architecture to estimate the communication overhead and provide realistic control and scheduling methodology for true heterogeneous architectures combined with FPGA accelerators, CPUs, and GPUs.

In our project, we will be using the following hardware frameworks:

  1.  ASICSs (application-specific integrated circuits
  2.  CPU
  3.  FPGA (field-programmable gate array)
  4.  GPU

Our goal is to answer three main questions:

  1. The communication overhead between tiles of accelerators and different processing units
  2. Best practices for memory usage in heterogeneous architectures (level of memory hierarchy)
  3. Best practices for scheduling software kernels on true heterogeneous architectures