The Architectural Implications of Facebook’s DNN-based Personalized Recommendation

Udit Gupta, Facebook Inc.
Carole-Jean Wu, Facebook Inc.
Xiaodong Wang, Facebook Inc.
Maxim Naumov, Facebook Inc.
Brandon Reagen, Facebook Inc.
David Brooks, Facebook Inc.
Bradford Cottel, Facebook Inc.
Kim Hazelwood, Facebook Inc.
Mark Hempstead, Tufts University
Bill Jia, Facebook Inc.
Hsien-Hsin S. Lee, Facebook Inc.
Andrey Malevich, Facebook Inc.
Dheevatsa Mudigere, Facebook Inc.
Mikhail Smelyanskiy, Facebook Inc.
Liang Xiong, Facebook Inc.
Xuan Zhang, Facebook Inc.

In Proceedings of the 26th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2020). February 2020. San Diego, CA.

[PDF]

Abstract

The widespread application of deep learning has changed the landscape of computation in data centers. In particular, personalized recommendation for content ranking is now largely accomplished using deep neural networks. However, despite their importance and the amount of compute cycles they consume, relatively little research attention has been devoted to recommendation systems. To facilitate research and advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recom-
mendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inference jobs can drastically improve latency-bounded throughput, and diversity across recommendation models leads to different optimization strategies.