This announcement marks a significant milestone for the company since delivery and activation of its 7nm AI inference server-on-a-chip, the NR1 NAPU, and successful bring up of its entire NR1 AI hardware and software system in the first quarter.
The NR1 AI Inference Solution enables businesses and organisations to run new AI training models and existing AI applications without over-investing millions in scarce GPUs. Regardless of AI accelerator performance, CPUs remain the primary performance bottlenecks in AI Inference, resulting in excessive power consumption and cost.
The NR1 system was said to be customer-ready in Q1 2024 after the NAPU arrived from TSMC Taiwan in December, followed by a successful bring-up in just 90 days.
“To activate a complex silicon-to-software AI system so quickly and smoothly within a small start up with an even smaller technical team is simply remarkable,” said Ilan Avital, Chief R&D Officer at NeuReality.
The system successfully met 99 percent of all functionality requirements, covering server-on-chip (SOC), IP, and software aspects, marking its readiness for early customer pilots - laying an affordable foundation for generative AI, multi-modality, and more advanced technologies to come.
The accompanying Software Development Kit (SDK) has been designed exclusively for high-volume, high-variety AI workloads in enterprise data centres. It contains hierarchical tools for all types of compute engines and XPUs, along with optimised partitioning – making it easier to install, manage, and scale.
NeuReality’s solution provides developers with significant flexibility to deploy the most advanced and complex AI pipelines more easily, based on the specific needs of their projects. It empowers users with a toolchain for complete AI pipeline acceleration, orchestration and provisioning and inference runtime APIs to streamline AI deployment workflow.
Citing a 35% AI adoption rate globally and lower than 25% rate in the US, NeuReality is focused on lowering market barriers to mainstream industries.
“It’s simply out of reach to the majority of businesses,” said Avital. “We can start changing that now by reducing high power consumption at the source - and educating customers that the ideal AI Inference servers require fundamentally different and more efficient server configurations than big supercomputers and high-end GPUs used in AI Training.”