Hardware-software codesign remains tomorrow’s technology
6 mins read
Around 15 years ago, Mentor Graphics made an acquisition that signalled the future for electronic design automation (EDA). The bad news is that development remains in the future, with no immediate prospect of it becoming reality.
The purchase of Microtec Research resulted in Seamless, a tool that made it possible to run software on virtual hardware; device drivers and other low level code could be debugged before the chip returned from the fab. Such virtual platform tools have steadily become the main tool used by chipmakers to avoid massive delays between first silicon and first shipments due to software not being ready.
The Microtec acquisition was expected to form the foundation of hardware-software codesign technology, a process in which changes in one domain would influence the other. By tuning both in parallel, you could, conceivably, build a far more optimised system than by trying to define hardware before anyone has considered what the software should look like.
While codesign remains a dream, system level design is receiving greater attention from chipmakers, although few refer to codesign explicitly. At June's Design Automation Conference, Freescale's CTO Lisa Su said the company was running into problems with software development that techniques similar to codesign could help. The next day, Intel's Gadi Singer said his company wants a design platform that takes system-level descriptions and turns them into hardware, pointing to research into the topic by the world's largest supplier of integrated circuits.
Singer outlined a plan to increase design productivity by moving away from an approach to chip design that has persisted for more than 20 years: using hardware description languages (HDLs) that operate at the register-transfer level (RTL). At the plan's core is a design environment in which hardware design is moved to system-level languages, after which synthesis tools generate RTL and, ultimately, logic gates. Silicon and software development would follow parallel paths. Any changes made at a low level, to fix logic bugs, would need to be reflected at the top level to ensure the system-level specification and the implementation do not go out of sync.
By synthesising a variety of implementations, engineers could conceivably pick the best for a given workload and process technology. Because today's designs are fixed at RTL, it becomes difficult to change the architecture without starting almost from scratch.
There are several problems with this ideal design environment. One comes down to financing development; a number of chipmakers want faster prototyping software to speed driver and firmware development, but none want to pay extra for the privilege.
The second problem lies in legacy. Most SoC designs use IP cores, such as off-the-shelf processors and communications interfaces, that cannot be altered easily from one generation to the next because it is too hard and too expensive to change the software that uses them. This reduces the degrees of freedom that a system architect has, even if they are armed with a new generation of high-level tools. Singer wants to 'iterate early and iterate often' to analyse many different implementation candidates. But that might not be feasible without shifting the entire design to the system level. Because of the presence of large quantities of third-party IP, major implementation changes might not be possible at all without reshaping the way in which verification works.
The constant reuse of blocks is causing verification problems as an increasing proportion of bugs moves into the cracks between functional blocks, rather than turning up inside them. These problems come as a result of concerns over power consumption, which have forced the use of low-energy design techniques. Parts of the system have to run flat out for short periods and then, as soon as they have finished their parcel of work, suddenly shut down and disconnect from the supply rails.
When a block powers down, it does not have any state until it fires up again. Most of the time, this does not matter. But it is easy for a piece of design written in RTL to assume the state of a connected block and to build that into its logic. It might assume, for instance, that the lack of an input – an 'X' in Verilog – is the same as a 0, which could be dangerous if it propagates further into the logic block. The trouble is that a simulator can make different assumptions to what happens when a synthesis tool takes that construct and reduces it into Boolean logic. A discrepancy between simulation and reality is well on its way to being a showstopping bug.
When such problems were confined to reset logic and bus interfaces, X propagation was relatively easy to trap by hand – by writing HDL statements that take an indeterminate input and give it a defined, safe value. Now, chips are full of potential X sources, all coming and going at different times.
A similar problem is the creation of 'sneak paths' for current, caused by mismatches between blocks powered from different supply rails. It is easy to omit gates that prevent these paths from forming when a block on one side of a power domain goes out of action.
This problem is likely to get worse as designs make greater use of voltage scaling. Today, you need to alter the voltage supplied to entire blocks if you want the logic in them to run more slowly at a lower voltage – and take advantage of the quadratic relationship between voltage and power consumption. An alternative being tried is to have individual logic paths switch between two or three voltage rails. This presents a new set of verification problems when it comes to determining what happens when voltage levels change between comparatively small blocks.
Even the logic itself could be obfuscated in the name of power savings. Clock-gated logic was only the start; attention is now falling on logic switching within datapaths and shutting down parts of the logic if they are not needed for a particular calculation. By reorganising the datapath, it is possible to ensure that high-activity nets are brought in towards the end of a computation, which prevents those signals rippling down a long chain of logic.
This could go further for long-distance, high-capacitance nets. Techniques such as non-return-to-zero and inversion protocols, used to limit switching activity on PCB buses, may move into the chip world if the power used by logic to decide when to flip signals across a bus is less than that used to simply send data across the chip.
Another way in which the assumptions of conventional synchronous RTL design are being challenged is in the clocking scheme itself. It now takes far longer than one clock cycle for signals to cross a chip. The best way to deal with this is to separate the chip into islands of synchronous design and to then link them using asynchronous protocols, another 'design technique of the future' that has resurfaced several times.
A variety of tools and techniques has appeared that attempted to simplify the problem of designing stable clockless systems, but all have met with little enthusiasm. Although greater use of asynchronous logic promised to cut power consumption, as demonstrated by experimental chips such as the University of Manchester's ARM-based Amulet processors, designers have stuck with synchronous design. They have embraced conceptually often more complex alternatives, such as clock gating, because they fit standard synchronous design methodologies.
However, the need to embrace a globally asynchronous, locally synchronous (GALS) architecture means asynchronous design tools are moving into the mainstream – even if they are sold as ways of checking for errors when signals cross between clock domains.
Because of the problems created by assembling chips from many small synthesised logic blocks, attention is likely to shift towards tools that can stitch together subsystems – even complete chips – from synthesised components, perhaps using description formats such as IP-XACT that attempt to describe the inputs and outputs of IP cores and how they should be used.
System-level tools might also be used to model the interactions between blocks at the transaction level, with interface synthesis – another research topic from the past 20 years – used to create the necessary logic. A combination of formal verification and simulation would then check the implementation against the specification. As a result, designers might finally be forced to the system level simply because of the number of automated transformations that have to be done to ensure power and clock domains abut neatly.
In double-patterning, the design for each layer needs to be split cleanly across two masks so that adjacent tracks can be placed close together. It is, effectively, a graph colouring problem, where each adjacent track is coloured differently. However, many dense designs do not map easily to a two-colour system but demand three colours. These cells need to be redesigned, possibly losing density, as the only way to place two traces of the same colour together is to double the distance between them.
However, not everything can be moved into the system level. Double patterning for sub-28nm chip designs raises issues that may not be automatable. Only a subset of on-chip features used commonly today can be split cleanly across two masks – many others cause violations or make it impossible to squeeze gates close together. So physical-design engineers may be forced to manipulate automatically generated logic paths so they can be split cleanly across two masks.
The result is that EDA is likely to split between a move to higher-level design, with much greater use of synthesis, accompanied by the need to rework some of the output from those tools at the physical level. That will put pressure on tools suppliers to not only make outputs efficient, but also easily read and understood. The HDLs themselves will become part of the past, in much the same way as schematic capture.