According to the company, this system provides an expandable, speedier memory capacity that frees up AI from memory constraints.
Within Panmnesia's CXL system, memory is decoupled from the central processing unit via CXL 3.0. This permits the storage of significant AI search data within the memory pool. In theory, CXL 3.0 enables the system to accommodate a memory capacity of up to 4 Petabytes per CPU’s root complex. Furthermore, their innovation amplifies the speed of query processing by 111.1 times, surpassing current methodologies, including Microsoft's production service.
There's a marked transformation in the evolution of search engines in recent times. Users who were confined to inputting minimal keywords for their search inquiries, are now benefitting from advancements in AI have accelerated the capabilities of search engines.
At the heart of this progression is the vector search algorithm. It sifts through the AI-generated portrayals of objects, known as embedding vectors, to pinpoint those most aligned with the user's intentions. Each embedding vector is a composition of hundreds of numerical values. For swift search through these embedding vectors, they are catalogued by a proximity graph, designed with the vectors as its nodes.
Though vector search can yield accurate results within milliseconds, it exerts considerable memory demands on the computing system. For example, Microsoft has disclosed that their search engine manages over 100 billion embedding vectors, potentially consuming more than 40 terabytes of memory space. To mitigate this memory burden, recent strategies include compressing the embedding vector to lessen memory requirements and utilising high-capacity SSDs to house all original embedding vectors along with the proximity graph.
These tactics, however, often lead to diminished accuracy and/or performance, posing a challenge to the delivery of high-quality real-time search services.
Panmnesia’s system for vector search, known as CXL-ANNS, looks to tackle these challenges. CXL-ANNS capitalises on the CXL-based disaggregated memory pool, providing infinite memory for vector search. Its architecture is scalable, linking multiple memory expanders to the CPU via a CXL switch, hence offering vast memory capacity as needed. The company has also revealed a complete system prototype based on actual hardware and software.
At the system’s heart is CXL, an open industry standard that provides cache-coherent interconnect between devices such as memory expanders and the CPU. This interconnect enables the CPU to access the internal memory of the memory expander device through standard memory instructions, such as load and store. This compatibility at the instruction level means the application can leverage the extensive memory space from the CXL memory pool without the need for modification. However, this comes with a drawback; each memory request to the CXL memory space necessitates data transfer via the interconnect, thereby adding a latency equal to or more than a DRAM access itself.
Panmnesia's research team tackled this challenge by introducing a software-hardware co-design specifically crafted for vector search. The software of CXL-ANNS strategically locates the frequently accessed nodes of the proximity graph in the local DRAM. This technique minimises CXL memory access during vector indexing, reducing the additional latency caused by CXL.
Conversely, the hardware of CXL-ANNS processes the embedding vectors within the memory expander device by incorporating a domain-specific accelerator (DSA) into its controller. The DSA rapidly analyses the embedding vector and generates a smaller result, indicating whether the vector aligns with user intent. By transferring this compact result to the CPU instead of the original vector, it significantly alleviates the data movement overhead.
The research illustrates that CXL-ANNS enhances the speed of vector search by 111.1 times in comparison to leading-edge vector search platforms that utilise compression and/or SSDs. They also underscored that CXL-ANNS surpasses a theoretical system with unlimited local DRAM resources by a factor of 3.8.
"CXL-ANNS establishes a new standard in crafting advanced systems specifically designed for representative datacentre applications, wholly harnessing the advantage of CXL technology," stated Myoungsoo Jung, CEO of Panmnesia. "We foresee our research sparking innovation within the community and stimulating the expansion of the CXL ecosystem."