A little knowledge

5 mins read

AI is finding its way into hardware verification, by not getting directly involved in the tests.

Credit: Vectorpoint - adobe.stock.com

AI hardware verification seems like an oxymoron. The general inability of the current generation of neural networks, even in the largest of large language models (LLMs), to reason clearly enough to perform even basic arithmetic do not point to it being a good technology for the world of chip design: an environment where mistakes in the design and the re-spins they cause are very expensive.

Yet, AI is finding its way into many parts of verification, partly by avoiding the actual circuit verification. Instead, it is writing scripts and tests that traditional EDA tools perform. At a UK DVClub meeting organised by consultancy Tessolve in September, Hemendra Talesara, an advisor to several verification startups, referred to these applications as ”low-hanging automation”.

Some of it has been around for a while. At the end of the last decade both Cadence Design Systems and OneSpin added machine-learning modules to their respective formal-verification tools. Both companies found relatively simple neural-network models could learn the types of problem that suit different solvers and so relieve users of making decisions about which to use and when.

A couple of years later, Cadence Design Systems bought the startup Verifyter, which developed the tool PinDown to analyse regression tests to try to isolate bugs more quickly. The underlying technology now forms part of Cadence’s Verisium portfolio and, as Paul Graykowski, product marketing director at the EDA company, explained at the DVClub meeting how it works for triaging failures. “Was there a check-in [to the design’s source code] that broke it? This is a great place for automation using AI. It can see potentially high-risk changes and make educated guesses about where the problem might have happened.”

Coverage analysis provides a different way of sorting through the noise of the randomised strategies used throughout SoC design today. Work at Bristol University, supported by the Infineon Technologies design group based nearby, has used several common types of neural network find ways to reduce the number of simulations needed to hit a certain level of coverage. Focusing on how much novelty each successive test brings, their recent work put the transformer structures that underpin LLMs against some older techniques.

Though the AI-based approaches mostly outperformed traditional random test methods, the group found the older method outperformed the transformer at lower coverage levels. All the methods converged as the desired coverage increased past the 98% point, where it becomes increasingly difficult to find any tests that hit remaining holes in the coverage. The work indicates that AI could have a strong role in the early phases of verification, getting coverage into the 80 to 90% zone ready for hand-crafted efforts to take over.

The big drawback

The big drawback of the LLM naturally comes when the design needs some level of logical reasoning to work out what to do. LLMs only predict the most likely chain of words they can find in their training set in response to a prompt. Graykowski uses the example of a bus arbiter used in one experiment on Verilog code generation. The model created the right inputs and outputs. “But we got this thing where all the incoming requests were granted at the same time. That’s not what we want,” he says.

Though the LLMs make mistakes in design, can they provide the tools to spot those errors and fix them? At the VLSI Test Symposium in the spring, engineers from Nvidia described how they finetuned the ChipNemo LLM they originally built to speed up placement and layout to create assertions for formal verification. But the assertions generated by LLMs on their own do not necessarily reflect design intent: they wind up trying to verify the wrong attributes or even attributes that do not exist in the design but which the LLM “hallucinated” into its response. In another set of experiments at Nvidia reported shortly afterwards at the Design Automation Conference (DAC), YunDa Tsai and colleagues found more than half the errors they encountered in LLM-generated Verilog were due to syntax errors the AI inserted into the code. The team put this down to hallucinations as well.

Retrieval augmented generation (RAG) provides one way to avoid the models getting tripped up by hallucinations. RAG couples the model to a database of phrases and document fragments that it can treat as legitimate facts. This, in principle, ties the LLM’s output more closely to reality. It also helps deal with the token limit as the model only has to compute a numeric vector to search the RAG database and pull out the item with the best match that it can incorporate into its response.

An open question is how best to compile the RAG database. The simplest method is to have a tool ingest existing natural-language documents, remove any redundant phrases and then give each discrete phrase that remains a reference in the form of a numeric vector. This follows the same kind of process as that used to train LLMs in the first place, where each word in each context winds up with its own vector that the model then works on.

At the DVCon Europe conference last month, researchers at Heilbronn University of Applied Sciences in Germany described how they used the high-level description of the Arm AXI protocol to build the RAG database of an assertion generator. This helped prevent hallucinations in their system to generate SystemVerilog assertions that corresponded to a set of high-level requirements provided in a specification for the target design. The tests focused on assertions to ensure targets signals were only asserted while other inputs were in certain states.

To stop their LLM from making syntax errors, Tsai’s group at Nvidia went for an approach that build more hand-crafted knowledge into the RAG database. One issue they found initially was that compilers can generate ambiguous error messages that are more difficult to fix, requiring humans to add context for typical situations in a process not unlike traditional expert systems.

Divide and conquer

The other route that researchers are taking to improve the quality of results is divide and conquer: break the tasks down into chunks that, hopefully, guide the LLM towards the right answers. This process is similar to the prompt engineering that people have found can improve the output from an LLM by effectively forcing it to perform tasks analogous to step-by-step reasoning.

In their work presented at DAC in the summer, Khushboo Qayyum and colleagues at the University of Bremen split the LLM’s role into three discrete parts to help define and verify core properties, such as when the carry-bit in an adder should be high or low, in HDL descriptions.

Because there are so many moving parts to making AI work in verification, some organisations are looking to industry collaborations to improve results. As part of its work on RAG-enhanced models and other aspects of generative AI, Tessolve is working with several partners and is looking to work with others. “Through collaboration with industry partners, we aim to build standardised databases and automate test designs for better performance evaluation,” says Tessolve senior vice president Mike Bartley. The RAG-centred work is looking at a number of tasks, he says, that include feature extraction, specification analysis, coverage improvement, automated test-code generation, assertion writing and streamlining test plans overall.

Though AI has some way to go before it develops the kind of reasoning needed to provide consistent results in verification, LLM technology may have begun to find a home there.