Neural inference at the frontier of energy, space, and time.

Dharmendra S Modha Filipp Akopyan Alexander AndreopoulosRathinakumar AppuswamyJohn V ArthurAndrew S Cassidy Pallab Datta Michael V DeBole Steven K EsserCarlos Ortega OteroJun SawadaBrian Taba Arnon Amir Deepika BablaniPeter J CarlsonMyron D FlicknerRajamohan Gandhasri Guillaume J GarreauMegumi ItoJennifer L KlamoJeffrey A KusnitzNathaniel J McClatcheyJeffrey L McKinstryYutaka NakamuraTapan K NayakWilliam P RiskKai SchleupenBen ShawJay SivagnanameDaniel F SmithIgnacio Terrizzano Takanori Ueda

Published in: Science (New York, N.Y.) (2023)

Computing, since its inception, has been processor-centric, with memory separated from compute. Inspired by the organic brain and optimized for inorganic silicon, NorthPole is a neural inference architecture that blurs this boundary by eliminating off-chip memory, intertwining compute with memory on-chip, and appearing externally as an active memory chip. NorthPole is a low-precision, massively parallel, densely interconnected, energy-efficient, and spatial computing architecture with a co-optimized, high-utilization programming model. On the ResNet50 benchmark image classification network, relative to a graphics processing unit (GPU) that uses a comparable 12-nanometer technology process, NorthPole achieves a 25 times higher energy metric of frames per second (FPS) per watt, a 5 times higher space metric of FPS per transistor, and a 22 times lower time metric of latency. Similar results are reported for the Yolo-v4 detection network. NorthPole outperforms all prevalent architectures, even those that use more-advanced technology processes.

Keyphrases