Alexander Hankin, Tufts University
Tomer Shapira, Tufts University
Karthik Sangaiah, Drexel University
Michael Lui, Drexel University
Mark Hempstead, Tufts University
Proceedings of the 2019 IEEE International Symposium on Workload Characterization (IISWC 2019). November 2019. Orlando, Fl.
To confront the memory wall and keep up with the demands of changing use cases, Non-Volatile Memories (NVMs) have begun to be considered as a replacement for SRAM in the Last Level Cache (LLC). Recent work has shown that the small cell size of NVMs like Spin- Torque Transfer RAM (STTRAM) and Resistive RAM (RRAM) allows designers to build significantly denser LLCs than those with SRAM-based cells. In some cases, this allows for storing up to 10× more data on-chip than before. As the working set size of use cases increases with the advent of statistical inference (e.g., machine learning (ML) and artificial intelligence (AI)), more capacity close to the processor is necessary to keep up with the demand for performance and low power.
Despite the growing potential of NVM-based LLCs, there are still fundamental problems that need to be addressed. First, the research community is lacking a methodology for consistently modeling these devices, which leads to apples-to-oranges comparisons across NVM-based LLCs. Second, NVMs exhibit a key operational difference with SRAM: read and write asymmetry. The effects of this asymmetry on use case performance and power are mostly unknown with prior art relying only on total read and write counts and on limited sets of use cases.
In this work we present two novel contributions: (1) a set of heuristics for modeling emerging NVM-based LLCs, and (2) a workload characterization framework that learns how architecture-agnostic features, like entropy and working set size, affect the performance and power of a NVM-based LLC system for different use cases. In addition, with this work we release our NVM cell models and make them publicly available online. Using our NVM-based LLC models we show that NVM-based LLC energy use is up to an order of magnitude less than that of an SRAM-based LLC while ED2P is generally on par. From our workload characterization framework, we show that for the AI use cases, energy and speedup are 99% correlated with write entropy, 90% write footprint, and unique write footprint while negligibly correlated with total read and write footprint.