Described as ‘a long-awaited cure for the ailments of big CPU-centric data centres’ that suffer from high inefficiency and expense NeuReality will demonstrate the world’s first affordable, ultra scalable AI-centric servers designed purely for inference, meaning, the daily operation of a trained AI model.
Running live AI data in data centres is expensive, and AI inferencing remains a blind spot in the industry, according to NeuReality Co-founder and CEO Moshe Tanach.
“ChatGPT is a new and popular example, of course, but generative AI is its infancy. Today’s businesses are already struggling to run everyday AI applications affordably - from voice recognition systems and recommendation engines to computer vision and risk management,” said Tanach. “Generative AI is on their horizon too, so it’s a compounding problem that requires an entirely new AI-centric design for inferencing. Our customers will benefit immediately from deploying our easy-to-install and easy-to-use solution with established hardware and solution providers.”
NeuReality focuses on one of the biggest problems in artificial intelligence; that is, making the inference phase both economically sustainable and scalable enough to support consumer and enterprise demand as AI accelerates.
However, for every $1 spent on training an AI model today, businesses spend about $8 to run those models, according to Tanach. “That astronomical energy and financial cost will only grow as AI software, applications and pipelines ramp up in the years to come on top of larger, more sophisticated AI models.”
With the NR1 system, future AI-centric data centres will see a 10x performance capability to empower financial, healthcare, government and small businesses helping them to create better customer experiences with more AI inside their products.
"NeuReality's AI inference system comes at the right time when customers not only desire scalable performance and lower total cost of ownership, but also want open-choice, secure and seamless AI solutions that meet their unique business needs," said Scott Tease, Vice President, General Manager, Artificial Intelligence and HPC WW at Lenovo.
At SC23, NeuReality will demonstrate its easy-to-deploy software development kit, APIs, and two flavours of hardware technology: the NR1-M AI Inference Module and the NR1-S AI Inference Appliance.
Along with OEM and Deep Learning Accelerator (DLA) providers, each demo addresses specific market sectors and AI applications that showcase the breadth of NeuReality’s technology stack and compatibility with all DLAs.
The systems architecture will feature one-of-kind, patented technologies including:
NR1 AI-Hypervisor hardware IP: a hardware sequencer that offloads data movement and processing from the CPU, an architectural cornerstone for heterogenous compute semiconductor device;
NR1 AI-over-Fabric network engine: an embedded NIC (Network Interface Controller) with offload capabilities for an optimised network protocol dedicated for inference. The AIoF (AI-over-Fabric) protocol optimises networking between AI clients and servers as well as between connected servers forming a large language model (LLM) cluster or other large AI pipelines;
NR1 NAPU (Network Addressable Processing Unit): a network-attached heterogenous chip for complete AI-pipeline offloading, leveraging Arm cores to host Linux-based server applications with native Kubernetes for cloud and data centre orchestration.
“The next era of AI relies on broad deployment of ML inference in order to unlock the power of LLMs and other maturing models in new and existing applications,” said Mohamed Awad, Senior Vice President and General Manager, Infrastructure Line of Business, Arm. "Arm Neoverse delivers a versatile and flexible technology platform to enable innovative custom silicon such as NeuReality's NR1 NAPU, which brings to market a powerful and efficient form of specialized processing for the AI-centric data centre."
NeuReality is shipping by the end of 2023 with an established value chain of software partners, original equipment manufacturers (OEMs), semiconductor deep learning accelerators (DLA) suppliers, cloud service providers, and enterprise IT solution companies such as Arm, AMD, CBTS, Cirrascale, IBM, Lenovo, Qualcomm, and Supermicro.
According to Tanach, "The NAPU is the Swiss army knife of AI-inference servers – easily integrated into any existing system architecture and with any DLA. So, no one needs to wait two or three years for someone to invent the ideal AI inference chip. We already have it.”