As transistor scaling decreases and dark silicon increases, emerging Non-Volatile Memory (NVM) technologies such as Spin-Torque Transfer RAM (STT-RAM) and Resistive RAM (ReRAM) have been noted as potential replacements for on-chip, SRAM-based last-level cache (LLC). Their superior density (along with other benefits) makes them an interesting option for use in caching. Recent work  in ISCA by Intel has shown active consideration of NVMs in the LLC, after the successful production of NVM-based main memories .
Figure 1: Spin Torque Transfer RAM (STT-RAM) cell (left), Resistive RAM (ReRAM) cell (right) [1, 2]
In this work, we address the challenges and numerous implications of replacing SRAM-based LLC with NVMs. These include:
- accurately modeling NVMs in the LLC using the existing software tools
- studying the tradeoffs associated with creating ultra-dense NVM-based LLCs
- analyzing the extent to which read and write assymetry in NVMs affects the performance and power of applications
- cache management techniques to maximize endurance and lifetime
- workload characterization to determine optimal architecture-agnostic features (i.e., read/write entropy, read/write working set size, read/write locality, etc) for an NVM-based LLC based on it’s unique design tradeoffs
So far, we have modeled a large set of NVMs from the literature that are promising for adoption in the last level cache. We have successfully modeled these NVMs using existing software by developing a set of heuristics to ensure accurate comparisons to SRAM-based LLC. We have run full system simulation of server-class x86 architecture across SPEC CPU 2006, SPEC CPU 2017 Artificial Intelligence, PARSEC 2.1, and the NASA parallel benchmarks and have studied the tradeoffs in performance, energy, and area.
To understand the variation in performance, energy, and area, we have begun a workload characterization study. We have analyzed the aforementioned workloads in terms of their memory access behavior using metrics such as: read and write entropy, read and write spatial locality, read and write working set sizes, and counts of unique and total reads and writes. We have found correlation between different kinds of memory behavior and the performance, energy, and area results that were observed. We observed that this correlation varied according to the type of application, for example, artificial intelligence vs physics simulations.
The current status of our work is to expand our understanding of the impact of workload features, develop a retention and endurance model for our NVM LLCs, and to leverage what we have learned in order to design optimal, application-domain-specific NVM-based memories for the next generation of applications.