Neural nets were widely studied in the early days of artificial-intelligence research, but by the 1970s, they’d fallen out of favour. In the past decade, however, they’ve enjoyed a revival, under the name ‘deep learning’.
“Deep learning is useful for many applications, such as object recognition, speech and face detection,” says Vivienne Sze, an assistant professor of electrical engineering at MIT. “Right now, the networks are pretty complex and are mostly run on high-power GPUs. You can imagine that if you can bring that functionality to your cell phone or embedded devices, you could still operate even if you don’t have a Wi-Fi connection. You might also want to process locally for privacy reasons. Processing it on your phone also avoids any transmission latency, so that you can react much faster for certain applications.”
The chip, which the researchers have dubbed ‘Eyeriss’, could help usher in the Internet of Things: With powerful artificial-intelligence algorithms on board, networked devices could make important decisions locally, entrusting only their conclusions, rather than raw personal data, to the Internet. Also, onboard neural networks would be useful to battery-powered autonomous robots.
The key to Eyeriss’ efficiency is to minimise the frequency with which cores need to exchange data with distant memory banks, an operation that consumes time and energy. Whereas many of the cores in a GPU share a single, large memory bank, each of the Eyeriss cores has its own memory. Moreover, the chip has a circuit that compresses data before sending it to individual cores.
Each core is also able to communicate directly with its immediate neighbours, so that if they need to share data, they don’t have to route it through main memory. This is essential in a convolutional neural network, in which so many nodes are processing the same data.
The final key to the chip’s efficiency is special-purpose circuitry that allocates tasks across cores. In its local memory, a core needs to store not only the data manipulated by the nodes it’s simulating but data describing the nodes themselves. The allocation circuit can be reconfigured for different types of networks, automatically distributing both types of data across cores in a way that maximises the amount of work that each of them can do before fetching more data from main memory.
At the conference, the MIT researchers used Eyeriss to implement a neural network that performs an image-recognition task, the first time that a neural network has been demonstrated on a custom chip.