Tachyum demonstrates Instruction Profiling Unit on Prodigy FPGA

1 min read

Tachyum has enabled an Instruction Profiling Unit (IPU), a low overhead way to collect the profile of non-instrumented executed code, to its Prodigy Universal Processor.

Credit: Peter Hansen - adobe.stock.com

The IPU is used by hyperscalers to profile applications in production execution and recompile code to gain better performance.

This latest enhancement is part of the company’s focus on refining and optimising the performance of the Tachyum Software Distribution Package upon its beta release. Using results collected by IPU in re-compiling applications can provide a 5-15% gain in performance depending on application, which would result in a huge financial benefit to users.

Earlier this year, Tachyum added a Performance Monitoring Unit (PMU) to the emulation system that empowers customers and partners with the ability to address bottlenecks and better optimise Prodigy performance for all applications and workloads. The PMU’s wide range of performance counters – supported by both software C-model and FPGA – helps with both system debugging and performance tuning. The addition of an IPU allows it to be used by Profile Directed Optimisations (PDO) and is important for Just In Time (JIT) compilers for optimising hotspots.

"IPU is essential for large-scale operators and is now readily available as part of our FPGA," said Dr. Radoslav Danilak, founder and CEO of Tachyum. “We believe this technology will also be used by smaller data centre operators. This is important for meeting our goals of Prodigy supplying industry-leading performance at significantly lower power and lower cost to organisations of all sizes looking to supercharge their workloads while supporting the greatest breadth of applications.”

As a Universal Processor for all workloads, Prodigy-powered data centre servers will be able to seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) with a single homogeneous architecture.

According to Tachyum, by eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilisation, Prodigy can reduce CAPEX and OPEX significantly. Prodigy integrates 192 high-performance custom-designed 64-bit compute cores, to deliver up to 4.5x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.