The computer system, which has been named Good, after the computer pioneer Jack Good, will be capable of delivering 10 Exa-Flops of AI floating compute, with up to 4 Petabytes of memory, and will have a bandwidth of 10 Petabytes/second. Costing around $120 million Graphcore said it was targeting 2024 for the launch.
These new processors have been designed to deliver up to 40% higher performance and 16% better power efficiency for real world AI applications when compared to its predecessors, but will not require any changes to existing software.
The flagship Bow Pod256 delivers more than 89 PetaFLOPS of AI compute, while the superscale Bow POD1024 delivers 350 PetaFLOPS of AI compute, allowing machine learning engineers to better address the growing size of AI models.
Bow Pods are intended for a wide range of AI applications - from GPT and BERT for natural language processing to EfficientNet and ResNet for computer vision, to graph neural networks.
Bow Pod16 delivers over 5x better performance than a comparable Nvidia DGX A100 system, according to Graphcore, and is half the price, resulting in up a 10x TCO advantage, for state-of-the-art computer vision model EfficientNet.
As well as offering up to 40% performance gains, Bow Pod systems are also significantly more power efficient, and this has been achieved by moving its 7nm IPU to TSMC’s Wafer-on-Wafer technology.
WoW stacks two flipped wafers together, connects them through TSVs and bonds them before dicing and the Bow IPU is the first commercial use of the technology.
Commenting Graphcore CTO Simon Knowles, Bow said, “As the highest-performing AI processor in the world” – delivering 89 PetaFlops of AI compute, “the Bow delivers a giant performance boost and improved power efficiency thanks to the use of a world first in 3D semiconductor technology.”
TSMC’s Wafer-on-Wafer 3D technology makes it possible to deliver much higher bandwidths between silicon die and has been used to optimise power efficiency and improve power delivery to Graphcore’s Colossus architecture at the wafer level.
With Wafer-on-Wafer in the BOW IPU, one wafer is used for AI processing, which is architecturally compatible with the GC200 IPU processor with 1,472 independent IPU-Core tiles, capable of running more than 8,800 threads, with 900MB of In-Processor Memory, and a second wafer for the power delivery die.
By adding deep trench capacitors in the power delivery die, right next to the processing cores and memory, Graphcore has been able to deliver power much more efficiently.
“TSMC has worked closely with Graphcore as a leading customer for our breakthrough SoIC-WoW (Wafer–on-Wafer) solution as their pioneering designs in cutting-edge parallel processing architectures make them an ideal match for our technology ,” said Paul de Bot, general manager of TSMC Europe.
“Graphcore has fully exploited the ability to add power delivery directly connected via our WoW technology to achieve a major step up in performance, and we look forward to working with them on further evolutions of this technology.”