With the cost of training advanced AI models, including large-scale language models (LLMs), increasing significantly due to larger model sizes and greater computational complexity, cost has become a major bottleneck in the development of AI.
To create high-quality AI models, a substantial number of processors are needed for parallel computing, which requires a sizable budget. However, with the availability of cost-effective processors, it's now possible to develop improved AI models within the same expense. The shift in demand for AI training processors is moving from sheer performance to cost-effectiveness.
In response, LeapMind has initiated the development of a new AI training and inference processor, referred to as the "AI chip."
Leveraging the company’s expertise from AI accelerator development for edge devices, this new chip targets a computing performance of 2 PFLOPS (petaflops) while aiming for a cost performance 10 times higher than that of an equivalent GPU.
According to a company spokesperson, the AI chip will be ready for shipment by the end of 2025 at the latest.
Commenting Matsuda Soichi, Chief Executive Officer of LeapMind, said, “We have achieved a high level of success and a proven track record in the development of edge AI inference accelerators. On the server side, we will accelerate the evolution of next-generation AI by developing new AI chips that leverage our accumulated technological expertise to accelerate the computing process of AI models."
The new AI chip has three major characteristics:
1) Designed for AI model training and inference,
2) Low-bit representation, and
3) Open-source drivers and compilers
Considering AI model training and inference as computational tasks, the following characteristics have emerged in terms of design:
- Matrix multiplication stands out as a computational bottleneck.
- These tasks can be easily executed in parallel.
- There are very few conditional branches involved.
Accordingly, the design approach taken by LeapMind leverages these features, creating AI chips specialised for AI model training and inference rather than pursuing performance improvements as a general-purpose computing machine. For example, minimising transistors by eliminating the branch prediction unit due to the scarcity of conditional branches in the programme.
The primary bottleneck in AI model calculations is matrix multiplication, which involves an extensive number of multiplications and additions. Multipliers typically require large circuits, but LeapMind is looking to reduce the number of necessary transistors by adopting lower bit-width data types, such as fp8. Additionally, downsizing the data processed effectively utilises DRAM bandwidth, which has been a bottleneck in recent years.
Developing advanced software stacks is essential for AI model development, and no single company can provide all the required components. An open-source software ecosystem involving multiple companies already exists, and to be a part of this ecosystem, it's crucial to engage as an open-source software community member.
LeapMind is to release comprehensive hardware specifications and software, including drivers and compilers, under OSI-compliant licenses to contribute to the open-source community.
These new AI chips will support training and inference for a wide range of neural networks, including generative AI models such as diffusion models, as well as training for large-scale language models.