Toward Faster and More Efficient Training on CPUs Using STT-RAM-based Last Level Cache

Alexander Hankin, Tufts University
Maziar Mehdizadehamiraski, Tufts University
Karthik Sangaiah, Drexel University
Mark Hempstead, Tufts University

12th Annual Non-Volatile Memories Workshop (NVMW), San Diego, CA, USA, 2021.



Artificial intelligence (AI), especially neural network-based AI, has become ubiquitous in modern day computing. However, the training phase required for these networks demands
significant computational resources and is the primary bottleneck as the community scales its AI capabilities. While GPUs and AI accelerators have begun to be used to address
this problem, many of the industry’s AI models are still trained on CPUs and are limited in large part by the memory system. Breakthroughs in NVM research over the past couple
of decades has unlocked the potential for replacing on-chip SRAM with an NVM-based alternative. Research into SpinTorque Transfer RAM (STT-RAM) over the past decade has
explored the impact of trading off volatility for improved write latency as part of the trend to bring STT-RAM on-chip. This is particularly STT-RAM is an especially attractive replacement for SRAM in the last-level cache due to its density, low leakage, and most notably, endurance.