The world is becoming smarter. From the smart phones in our pockets to today’s smart cities helping manage traffic and transportation systems, Artificial intelligence (AI) is becoming pervasive across nearly every industry and impacting all our daily lives. This infusion of AI is also producing huge amounts of unstructured data that must be managed and processed, often in real time. The demands on hardware are skyrocketing, placing increasing reliance on innovations in chip architectures to deliver the performance improvements necessary to keep pace.
Just as it took an army of thousands of engineers and billions in R&D investments from hundreds of companies to deliver the continuous improvements in chip performance associated with Moore’s Law, the same will be true of the silicon chips powering AI workloads. It’s not a one-size fits all world, and it will not be one company or chip architecture that will dominate.
So, how can your hardware keep up with every-increasing demand of AI processing? The answer is Domain Specific Architecture (DSA). DSAs are the future of computing, where hardware is customised to run a specific workload. DSAs closely match compute, memory bandwidth, data paths, and I/O to the specific requirements of an application workload. This delivers a much higher level of processing efficiency compared with general purpose CPU and GPU architectures.
The downside of fixed silicon
DSAs can be built using a dedicated silicon device, but that can have-drawbacks. First, a critical part of the landscape is the increasing demand for not only faster, better and cheaper, but also sooner. To keep up with the pace of innovation, manufacturers are expected to create and deliver new services in shorter timeframes than ever before. More specifically, in less time than it takes to design and build a new ASIC-based DSA. This creates a fundamental market misalignment between the market demands on innovation and the time it takes to design and build ASICs.
Second, ASICs are hard to do. The complexity of designing advanced-node ASICs is growing exponentially, significantly increasing risk for failure. A single, small mistake can have huge implications, not the least of which include non-recurring engineering (NRE) costs. A complex 7nm
ASIC, for example, costs several hundred million dollars just in NRE, with costs projected to rise further as device geometries shrink to 5nm and beyond. Third, such fixed-silicon implementations are not future-proofed. Changes to industry standards or other fluctuating requirements can quickly render the device obsolete. These are just some of the downsides of fixed silicon.
So how can the industry continue making architectural advancements and build DSAs fast enough to keep up with the pace of innovation? The solution lies in adaptive computing.
The power of adaptive computing
Built on FPGA technology, adaptive computing allows DSA’s to be dynamically built in silicon. Adaptive computing therefore allows the DSAs to be dynamically updated as requirements change; freeing us from lengthy ASIC design cycles, and exorbitant NRE costs. It enables over-the-air (OTA) updates not just for software, but also for hardware, which is especially important as processing becomes more distributed. For example, the 2011 Mars rover Curiosity as well as the recently launched Perseverance both contain adaptive computing.
Perseverance uses adaptive computing for its comprehensive vision processor. Built using an FPGA-based platform, it accelerates AI and non-AI visual tasks, including image rectification, filtering, detection and matching. The images that Perseverance sends back to NASA will have been processed using adaptive computing.
If a new algorithm is invented during the eight months it will take Perseverance to reach Mars, or a hardware bug is discovered, adaptive computing allows hardware updates to be sent remotely, over the air or over space in this case. These updates can be done as quickly and easily as a software update. When the deployment is remote, such remote hardware updates are more than just a convenience, they are a necessity.
Adaptive computing can be deployed from the cloud to the edge to the endpoint, bringing the latest architectural innovations to every part of end-to-end applications. This is possible thanks to a wide range of adaptive computing platforms – from large capacity devices on PCIe accelerator cards in the data centre, to small, low power devices suitable for endpoint processing needed by IoT devices.
Adaptive computing can be used to build all manner of optimised DSAs, from latency-sensitive applications such as autonomous driving and real-time streaming video, to the signal processing in 5G and the data processing of unstructured databases. And with today’s hardware abstraction tools, software and AI developers can now take full advantage without needing to be hardware experts.
Adaptive computing accelerates the whole application
Rarely does AI inference exist in isolation. It is part of a larger chain of data analysis and processing, often with multiple pre and post stages that use a traditional (non-AI) implementation. The embedded AI parts of these systems benefit from AI acceleration of course. The non-AI parts also benefit from acceleration. The flexible nature of adaptive computing is suited to accelerating both the AI and non-AI processing tasks. We call this whole-application acceleration and it will become increasingly important as AI permeates through ever more applications.
We’ve talked about the power of adaptive computing, but there are more tricks up its sleeve.
Enabling entirely new systems
Adaptive computing is enabling entirely new systems that would not be possible with other technologies. It not only enables rapid architectural innovation, it also enables swapping between architectures - while the system is running. These new systems provide significant performance, power and cost improvements and are just not be possible with fixed silicon like CPUs and GPUs and ASICs.
For example, in modern vehicles, there are many cameras, each of which is monitored by software and increasingly AI. The processing of a front facing camera and a rear facing camera are usually mutually exclusive – only one is being monitored at a time, depending on the direction of travel. Adaptive computing allows them to share processing resources. When moving forward, only the front video steam needs to be processed, and when moving backwards, only the back one is processed. When the car is moved from “drive” to “reverse”, the hardware is reconfigured to implement the rear camera processing algorithms which may be different than the front facing ones. This gives a higher overall performance, but at the same time, reduces power consumption and cost. For the end-consumer, that translates into more features for less money.
Conclusion
Moore’s Law as defined, is dead. The spirit of Moore’s Law however is alive and kicking. Adaptive computing and architectural advances such as DSAs are helping to maintain the faster, better, cheaper we’ve all become accustomed to. At the same time, adaptive computing is delivering on the market need of “sooner” by eliminating the lengthy design cycles required to build new ASIC-based DSAs. This trend will continue the rapid architectural advancements that keep the spirit of Moore’s law alive, long after the real law has ceased. That’s the paradox of Moore’s Law.
Author details: Ivo Bolsens is Senior VP & Chief Technology Officer, Xilinx