This new core has been designed to enhance the performance of deep learning by 10X and enable more neural network processing per square millimetre.
FPGAs are increasingly being used to implement AI and more specifically machine learning, deep learning and neural networks as approaches to achieve AI. The key function needed for AI are matrix multipliers, which consist of arrays of MACs (multiplier accumulators). In existing FPGA and eFPGAs, the MACs are optimised for DSPs with larger multipliers, pre-adders and other logic which are overkill for AI. For AI applications, smaller multipliers such as 16 bits or 8 bits, with the ability to support both modes with accumulators, allow more neural network processing per square millimetre.
“We’ve had significant customer interest in using eFPGA for AI applications because of the performance advantages it can deliver in these chip design,” said Geoff Tate, CEO and co-founder of Flex Logix. “In fact, one of the first customers Flex Logix announced was Harvard who chose our eFPGA for their deep learning design. They are already working on a follow-on 16nm AI chip that will make more extensive use of the Flex Logix EFLX eFPGA.”
According to Flex Logic, AI customers want more MACs/second and more MACs/square millimetre, but they also want the flexibility to reconfigure designs as AI algorithms are changing rapidly. They require the ability to switch between 8 and 16 bit modes as needed and to implement matrix multipliers of varying sizes to meet their applications’ performance and cost constraints.
The EFLX4K AI eFPGA core leverages Flex Logix’s patented XFLX interconnect to provide an area-efficient, reconfigurable AI solution. It uses a new AI-MAC architecture capable of implementing 8-bit MACs or 16-bit MACs reconfigurably (as well as 16x8 and 8x16). A single EFLX4K AI core in 16nm for example will be about 1.2 square millimetres with 441 8-bit MACs running at 1GHz for a throughput of 441 GigaMACs/second at worst case silicon conditions. The EFLX4K AI core can be arrayed up to at least a 7x7, enabling performance of ~22 TeraMacs/second in worst case silicon conditions.
The EFLX4K AI is footprint compatible with currently available, silicon proven EFLX4K cores to allow architects and designers the ultimate flexibility in RTL reconfigurability.
The EFLX4K AI eFPGA core is fully supported by Flex Logix’s existing software flow using the EFLX Compiler.