The San Jose start-up said that the chip, the Mozart, is a TSMC 16nm design that utilises HBM2 memory and is sampling as a standard PCIe card. Initial results have shown significant speedups across a broad set of AI applications, unlike other specialised AI chips. Examples include recommendation engines, speech and language processing, and image detection, all of which can run simultaneously.
Mozart is the result of 10 years of research at the University of Wisconsin-Madison that included several high-profile papers and awards. The research was led by SMI’s Founder, CEO and CTO Karu Sankaralingam, also a computer science professor at UW.
The working chip demonstrates that a clean-slate design can deliver the performance of custom-built single purpose processors while maintaining both flexibility and programmability. Its software interface includes direct TensorFlow support as well as API’s for C/C++ and Python.
Mozart will be available via a PCIe card called Accelerando, or via the Symphony Cloud Service (SMI’s hosted cloud service with access to public clouds like Azure, Google Cloud Platform, and AWS).
“Mozart is an extremely complex chip, one of the few using HBM2 (high-bandwidth memory) for this type of application,” Sankaralingam said. “It took our silicon team under a week from having the chip in-house to running applications on the device, putting us on the fast track to take our A0 silicon to production.”
Unlike many of the new chips that have been designed for single workloads like image processing, Mozart’s strength, according to the team at SimpleMachines, is in its ability to adapt on the fly and accelerate a wide variety of workloads across a diverse range of industries and applications.
“The chip’s design can support very large models today and is capable of running up to 64 different models simultaneously,” said Greg Wright, SMI’s Chief Architect. “Our next-generation 7nm design is expected to be ready to sample by the end of 2021 and will be 20x faster on a diverse set of workloads than current chips.”
Mozart’s architecture leverages the concept of Composable Computing, which abstracts any software application into a small number of defined behaviours. SMI’s compiler integrates into the backend of standard AI frameworks like TensorFlow to translate those programs and reconfigures the hardware on the fly to result in a chip that behaves as if it were originally designed for that application.
“SimpleMachines’s solution is a radically new software-centric approach that deploys a programmable platform with a breakthrough software stack and compiler that enables the programmer to easily optimize the hardware on the fly and get the performance of custom silicon with a platform that supports hundreds of different use cases,” Wright said.