The announcement, made at the Hot Chips conference, described IBM's first processor to contain on-chip acceleration for AI inferencing while a transaction is taking place.
Three years in development, this new on-chip hardware acceleration is designed to help customers achieve business insights at scale across banking, finance, trading, insurance applications and customer interactions. A Telum-based system is planned for the first half of 2022.
Telum has been designed to enable applications to run efficiently where the data resides, helping to overcome traditional enterprise AI approaches that tend to require significant memory and data movement capabilities to handle inferencing.
With Telum, the accelerator being in close proximity to mission critical data and applications means that enterprises can conduct high volume inferencing for real time sensitive transactions without invoking off platform AI solutions, which may impact performance.Clients can also build and train AI models off-platform, deploy and infer on a Telum-enabled IBM system for analysis.
According to recent Morning Consult research commissioned by IBM, 90% of respondents said that being able to build and run AI projects wherever their data resides is important.
IBM said that Tecum will help users to move their thinking from a fraud detection posture to a fraud prevention posture, evolving from catching many cases of fraud today, to a potentially new era of prevention of fraud, without impacting service level agreements (SLAs), before the transaction is completed.
The chip has a centralised design, which will allow users to leverage the full power of the AI processor for AI-specific workloads, making it suitable for financial services workloads like fraud detection, loan processing, clearing and settlement of trades, anti-money laundering and risk analysis.
The chip was made on Samsung’s 7nm process and contains 8 processor cores with a deep super-scalar out-of-order instruction pipeline, running with more than 5GHz clock frequency, optimized for the demands of heterogenous enterprise class workloads.
The completely redesigned cache and chip-interconnection infrastructure provides 32MB cache per core, and can scale to 32 Telum chips. The dual-chip module design contains 22 billion transistors and 19 miles of wire on 17 metal layers.