The round was co-led by Kindred Capital, NATO Innovation Fund, and Oxford Science Enterprises, with participation from Cocoa and Inovia Capital, together with angel investors including Hermann Hauser (co-founder, Acorn, Amadeus Capital), Stan Boland (ex-Icera, NVIDIA, Element 14 and Five AI), and Amar Shah (co-founder, Wayve). To date, Fractile has raised $17.5m (£14m) in total funding.
Founded in 2022 by Walter Goodwin (pictured), Fractile has developed an innovative approach to the design of chips for AI inference that it claims can deliver transformational improvements in performance for frontier AI models in deployment.
AI companies are today engaged in a hypercompetitive race to build, train and deploy the best foundational models, requiring vast investments in computational resources. However, they are reliant on very similar hardware. These chips and highly developed tools and libraries are well-optimised for training large language models (LLMs), but are unsuited to inference, which is the process of running live data (input tokens) through a specific model with learnt parameters, to produce results (in LLMs, a series of output tokens).
Consequently, this means:
AI models are very expensive to provision and run at scale. Issues like the time taken on conventional hardware to move model parameters from memory to processors mean that very expensive hardware is often used at a small fraction of its theoretical capability, driving up costs.
AI performance is inhibited. Ever faster compute cannot make up for the performance lag caused in inference by moving model weights from memory to the processor units, limiting real-time performance and user experience.
Potential AI performance in the future is restricted. Continual advancement of conventional computing is limited by the heat generated by these chips. There is a limit to how fast we can cool silicon chips, and this has become the new constraint on continuing to scale conventional digital processors (the end of Dennard Scaling).
With enough data, bigger AI models are predictably better, but without breakthroughs in compute systems, it will not be possible to continue to scale AI models to be orders of magnitude larger with sufficiently low latency (time per output token, for instance) to be useable.
Restricted opportunity for AI model providers to drive differentiation. With every AI model provider building on similar infrastructure and the balance of its use tilting heavily to inference, without novel hardware the opportunity to create long-term differentiation and competitive advantage from faster, cheaper and higher quality token generation in inference will be severely limited.
There are two paths available to a company attempting to build better hardware for AI inference. The first is specialisation, honing in on very specific workloads and building chips that are uniquely suited to those specific requirements or to fundamentally change the way that computational operations themselves are performed, creating entirely different chips from these new building blocks, and building massively scalable systems on top of these.
This is Fractile’s approach, which according to the company will help to unlock breakthrough performance across a range of AI models both present and future.
According to Fractile its system will be able to achieve significantly improved levels of performance on AI model inference – initial targets are 100x faster and 10x cheaper – by using new circuits to execute 99.99% of the operations needed to run model inference. A key aspect is a shift to in-memory compute, which removes the need to shuttle model parameters to and from processor chips, instead baking computational operations into memory directly.
Fractile’s approach also looks to ensure that its technology is fully compatible with the leading-edge unmodified silicon foundry processes that all leading AI chips are built on.
Not only will Fractile provide significant speed and cost advantages, but it does so at a substantial power reduction. Power – sometimes measured in Tera Operations Per Second per Watt (TOPS/W) – is the biggest fundamental limitation when it comes to scaling up AI compute performance (see notes below for more detail).
Fractile’s system is targeting 20x the TOPS/W of any other system visible to the company today. This allows for more users to be served in parallel per inference system, with – in the case of LLMs for example – more words per second returned to those users, thereby making it possible to serve many more users for the same cost.
Such powerful inference hardware can be leveraged by AI model providers for huge performance advantage from existing models. Currently, to get output from the largest models that matches human reading speed, AI companies tend to deploy systems which leverage purely ‘next token prediction’. With faster speeds, AI model providers can cost-effectively introduce recursive queries, chain of thought prompting and tree search, and users can get much better answers from the same models: the equivalent of transporting a foundation AI model from two years in the future into the present day.
It’s not just language models that see this sort of qualitative shift when they can be run faster and at lower cost, according to Fractile. Its performance leap on inference will accelerate AI’s ability to solve the biggest scientific and computationally heavy problems, from drug discovery to climate modelling to video generation.
Fractile has made a number of senior hires from NVIDIA, ARM and Imagination, and has filed patents protecting key circuits and its unique approach to in-memory compute. The company is already in discussions with potential partners and expects to sign partnerships ahead of production of the company’s first commercial AI accelerator hardware.
Fractile said that it will use the funding to continue to grow its team and accelerate progress towards the company’s first product.
Dr. Walter Goodwin, CEO and Founder of Fractile, “In today’s AI race, the limitations of existing hardware - nearly all of which is provided by a single company – represent the biggest barrier to better performance, reduced cost, and wider adoption. Fractile’s approach supercharges inference, delivering astonishing improvements in terms of speed and cost. This is more than just a speed-up – changing the performance point for inference allows us to explore completely new ways to use today’s leading AI models to solve the world’s most complex problems. We’re thrilled to have raised our funding from investors with a wealth of experience in the AI and chip industries, continue to grow our world-class team and further our technological development and partnerships.”