The M2000 IPU machine with four IPU chips is capable of delivering 1petaflop of compute, which is available as a standalone or in datacentre pods allowing the connection of up to 64,000 IPU chips.
The company said that the M2000 can be added to existing data centre infrastructure and the IPU Engine is combined with an IPU Fabric chip, allowing multiple IPU Machines to be added in the rack, and racks of IPU Machines to be connected to build large IPU pods.
“The IPU-Machine M2000 is a plug-and-play Machine Intelligence compute blade that has been designed for easy deployment and supports systems that can grow to massive scale,” explained Graphcore CEO Nigel Toon. “The slim 1U blade delivers one PetaFlop of Machine Intelligence compute and includes integrated networking technology, optimised for AI scale-out, inside the box.”
At the heart of the M2000 is the company's Colossus Mk2 GC200 IPU chip. With 59.4 billion transistors on a single 823sqmm die, Graphcore claims that it is the most complex processor ever made. It is made on TSMC’s 7nm process and integrates 1,472 IPU cores capable of executing 8,832 separate parallel computing threads.
Each IPU has a huge amount of In-Processor Memory with the Mk2 GC200 providing 900MB ultra-high-speed SRAM inside the processor. This is spread across the IPU, with In-Processor Memory sitting right next to each processor core in an IPU-Tile for the lowest energy access per bit.
According to Graphcore, that 900 MB is a 3x step up in density when compared to its Mk1 IPU and is enough to hold massive models, prior state, or many layers of even the world’s largest models inside the chip running at the full speed of the processor.
The company's Poplar software also allows IPUs to access Streaming Memory through its Exchange-Memory communication. This allows large models with 100’s billions of parameters to be supported.
Each IPU-Machine M2000 can support Exchange-Memory with up to 450GB in density with a bandwidth of 180TBytes/sec. As a result, the IPU Exchange-Memory delivers over a 10x advantage in density together with over a 100x advantage in memory bandwidth when compared to the very latest 7nm GPU products