The aiWare4 RTL being shipped delivers up to 5x the performance of the previous generation aiWare3 NPUs, while using less than 2x the silicon area, highlighting the devices scalability, PPA and performance per mm2, while at the same time extending the feature set and operational “sweet spot” for high-efficiency CNN acceleration.
“Our aiWare team has relentlessly refined our production validation processes to enable us to deliver customer configurations at record speed to full automotive quality for aiWare4,” said Márton Fehér, SVP hardware engineering at aiMotive. “Thanks to our sophisticated wavefront processing making full use of our new WFRAM technology, plus many other architectural advances from aiWare3, we have been able to achieve exceptional PPA for our lead customers without compromising our leadership in high-efficiency execution up to 95% of the most demanding automotive inference CNN workloads”.
To achieve extremely demanding PPA constraints from customers, aiMotive was able to fine-tune the exact feature set of the aiWare4 production RTL to best meet customers’ requirements. Making full use of the physical tile-based layout and dataflow methodologies, the aiWare team demonstrated clock speeds for the production RTL of up to 1.3GHz over the full automotive AEC-Q100 Grade 2 temperature range on a 14nm process.
The aiWare4 hardware IP has been assessed externally as suitable for certification to ASIL-B or higher as an SEooC.
The aiWare4 NPU scales from 1 to 256 TOPS and is supported by a comprehensive SDK featuring highly accurate offline performance estimation, enabling customers to accurately estimate and fine-tune their CNN workloads to within 5% of final silicon performance prior to first silicon.