According to the company its SoC will help transform AI inference solutions used in a wide range of applications – from natural language processing and computer vision to speech recognition and recommendation systems.
With the mass deployment of AI as a service (AIaaS) and applications such as ChatGPT, NeuReality’s solution is set to play an important part in an industry urgently in need of affordable access to modernised, AI inference infrastructure.
In trials with AI-centric server systems, NeuReality’s NR1 chip demonstrated 10 times the performance at the same cost when compared to conventional CPU-centric systems – delivering cost-effective, highly-efficient execution of AI inference.
AI Inference traditionally requires significant software activity at what is often eye-watering costs.
The NR1 chip represents the world’s first NAPU (or Network Addressable Processing Unit) and will be seen as an antidote to an outdated CPU-centric approach for inference AI, according to Moshe Tanach, Co-Founder and CEO of NeuReality. “In order for Inference-specific deep learning accelerators (DLA) to perform at full capacity, free of existing system bottlenecks and high overheads, our solution stack, coupled with any DLA technology out there, enables AI service requests to be processed faster and more efficiently.
“Function for function, hardware runs faster and parallelizes much more than software. As an industry, we’ve proven this model, offloading the deep learning processing function from CPUs to DLAs such as the GPU or ASIC solutions. As in Amdahl’s law, it is time to shift the acceleration focus to the other functions of the system to optimise the whole AI inference processing. NR1 offers an unprecedented competitive alternative to today’s general-purpose server solutions, setting a new standard for the direction our industry must take to fully support the AI Digital Age.”
NeuReality is helping to drive the transition from a largely software centric approach to a hardware offloading approach where multiple NR1 chips work in parallel to easily avoid system bottlenecks. Each NR1 chip is a network-attached heterogeneous compute device with multiple tiers of programmable compute engines including PCIe interface to host any DLA; an embedded Network Interface controller (NIC) and an embedded AI-hypervisor, a hardware-based sequencer that controls the compute engines and shifts data structures between them.
Hardware acceleration throughout NeuReality’s automated SDK flow lowers the barrier to entry for small, medium, and large organisations that need performance, low power consumption and affordable infrastructure – as well as ease of use for inferencing AI services.
“It’s full steam ahead as we reach this highly anticipated manufacturing stage with our TSMC partners. Our plan remains to start shipping product directly to customers by the end of the year,” said Tanach.