The servers are designed for AI applications where low latency and high application performance are key requirements. The 2U NVIDIA HGX A100 4-GPU system has been designed to deploy modern AI training clusters at scale with high-speed CPU-GPU and GPU-GPU interconnects.
The system is able to reduce energy usage and costs by sharing power supplies and cooling fans, reducing carbon emissions, and supporting a range of discrete GPU accelerators, which can be matched to the workload. Both of these systems include advanced hardware security features that are enabled by the latest Intel Software Guard Extensions (Intel SGX).
"Supermicro engineers have created another extensive portfolio of high-performance GPU-based systems that reduce costs, space, and power consumption compared to other designs," said Charles Liang, president and CEO, Supermicro. “With our innovative design, we can offer customers NVIDIA HGX A100 (code name Redstone) 4-GPU accelerators for AI and HPC workloads in dense 2U form factors. Also, our 2U 2-Node system is designed to share power and cooling components which reduce OPEX and the impact on the environment."
The 2U NVIDIA HGX A100 server is based on the 3rd Gen Intel Xeon Scalable processors with Intel Deep Learning Boost technology and is optimised for analytics, training, and inference workloads.
The system can deliver up to 2.5 petaflops of AI performance, with four A100 GPUs fully interconnected with NVIDIA NVLink, providing up to 320GB of GPU memory to speed breakthroughs in enterprise data science and AI. The system is up to 4x faster than the previous generation GPUs for complex conversational AI models like BERT large inference and delivers up to 3x performance boost for BERT large AI training.
In addition, the advanced thermal and cooling designs make these systems suitable for high-performance clusters where node density and power efficiency are priorities.
Liquid cooling is also available for these systems, resulting in even more OPEX savings. Intel Optane Persistent Memory (PMem) is also supported on this platform, enabling significantly larger models to be held in memory, close to the CPU, before processing on the GPUs. For applications that require multi-system interaction, the system can also be equipped with four NVIDIA ConnectX-6 200Gb/s InfiniBand cards to support GPUDirect RDMA with a 1:1 GPU-to-DPU ratio.
The 2U 2-Node is an energy-efficient resource-saving architecture designed for each node to support up to three double-width GPUs. Each node also features a single 3rd Gen Intel Xeon Scalable processor with up to 40 cores and built-in AI and HPC acceleration.
A wide range of AI, rendering, and VDI applications are seen as benefitting from this balance of CPUs and GPUs.
Equipped with Supermicro's advanced I/O Module (AIOM) expansion slots for fast and flexible networking capabilities, the system can also process massive data flow for demanding AI/ML applications, deep learning training, and inferencing while securing the workload and learning models. It is also suitable for multi-instance high-end cloud gaming and many other compute-intensive VDI applications.
In addition, Virtual Content Delivery Networks (vCDNs) will be able to meet increasing demands for streaming services. Power supply redundancy is built-in, as either node can use the adjacent node's power supply in the event of a failure.