Get to know your software better

5 mins read

Do you know what’s lurking in your software? Even if you do not think it’s important, governments and standards bodies think you should.

Governments around the world have taken notice of the many security flaws in programs at all levels. And they want developers to take more responsibility for what goes into software-intensive products.

Take the upcoming Cyber Resilience Act from the European Union as an example. Part of the current draft of the act calls for products to be delivered with no known exploitable vulnerabilities and to limit attack surfaces. Vendors may want to argue they did not know at the time of development the software was vulnerable. But a big problem for that defence is going to be in the amount of third-party code, whether closed or open source, that many connected devices now use and the fact that known vulnerabilities are listed in online databases.

Some of the code can wind up in many places. At the recent Secure our Streets conference on secure and safe automotive design, several speakers pointed to a problem that hit a relatively obscure part of the Apache webserver software: the Log4j event-logging utility.

“It was a critical vulnerability that allowed remote-command execution, and which was exploited at scale. It is a dependency that you will find in virtually every Java application,” explained Florian Lukavsky, CTO of Onekey. He also pointed to the discovery in 2021 by a joint team of security researchers of more than 30 individual vulnerabilities in TCP network stacks that are used by millions of IoT and embedded devices.

To make developers take more notice of these common faults, the Biden administration issued an executive order on cybersecurity in May 2021, part of which demands products used by government agencies come with a software bill of materials (SBOM). Similar requirements have appeared in medical and automotive standards. As of October, this year, the US Food and Drug Administration has stopped accepting medical devices for approval that do not come with an SBOM.

Just putting together an SBOM does not solve the problem. “An SBOM is really a snapshot: the components at one particular moment. When you make changes, you will need to update that,” noted David Leichner, chief marketing officer of Cybellum, during Secure our Streets.

Management tools

The need to address regular changes has led to the creation of a variety of management tools that automate the generation of database records as new versions are created. In principle, in a flow built around continuous integration and delivery, such tools update the SBOM as soon as developers check in software module updates to the latest build.

When someone reports a vulnerability publicly, the tools check which products are affected and, in some cases, perform a triage to determine how badly a target might need an update. Some of that information can come from public repositories such as the Vulnerability Exploitability eXchange (VEX), which score the bugs in terms of how risky they are to leave unfixed. Context is also important. If the vulnerable part of a code module cannot be reached, developers can reduce the priority of applying the fix. “But make sure to document also the reasons why you disregard certain vulnerabilities,” Lukavsky advises.

SBOMs and public data potentially go a long way toward identifying latent risks in a software-intensive product. But that is only one part of the problem. This is where testing for vulnerability to out-of-bounds conditions such as buffer overflows and unhandled exceptions comes in.

When it comes to testing applications against these kinds of flaws, developers have numerous tools they can turn to. Fuzzing, which involves hitting the target with various bizarre inputs in the expectation that one of them will force a failure, and more targeted penetration testing can be effective. But they rely on design insight for maximum effect. Also, it is hard to know whether you have caught all the problems.

Static verification techniques, such as symbolic execution, can provide very high degrees of coverage, though they can equally generate a lot of false positives because they may not be able to assess how the error could be reached. Symbolic execution tests all possible branches and conditions within a module by building a mathematical model of the code statements and then using satisfiability solvers and similar mathematical tools to determine what states are possible.

“We can do more than buffer overflows,” claims Fabrice Derepas, CEO of Trustinsoft. The aim of the company’s analyser is to find and help eliminate the undefined behaviours that a program might have. These are frequently the gateways for hackers because there are no guarantees how the code will run under different circumstances. They can cause a program to fail to check credentials properly, opening the door to privilege escalation. Or, with the buffer overflow, load instructions ready for the second part of an attack. “With our approach, you can guarantee the absence of entire families of vulnerabilities in a piece of code.”

The symbolic execution technique employed by Trustinsoft focuses on memory, which makes it possible to detect more subtle vulnerabilities sometimes exploited by hackers, such as using memory contents after they have been released and supposedly made inaccessible.

“We build a symbolic representation of the memory to model the behaviour of all the possible states in the memory. This is why we can say you are using a memory location after you have already called the function free() on it,” Derepas explains.

Most symbolic-execution tools evaluate the source code. This is a compiler-independent exercise, though tools consider the different versions of languages produced by the standards bodies. This is because different assumptions in these standards will guide how code is generated. However, some problems can appear with the generated code even if its logical behaviour is fully defined. This may be due to bugs in the compiler or over-aggressive optimisations. For example, attempts to delete secret information from memory once the program has used it may fail because the compiler determines the new, overwritten value is never explicitly used. With an aggressive optimisation, it simply does not implement the write to zero out the contents in the expectation that once the variable is freed nothing will find the value still lurking in memory.

Among the efforts to address the issues of built code, a team working with Sébastien Bardin, senior researcher at French research institute CEA-List, has used the open-source symbolic-execution engine KLEE originally developed at Stanford University to create a binary-level tool.

Working with binaries introduces issues as there is far less semantic information a tool can use. This can lead to many more false positives, so the team worked on an approach for the Binsec/Rel tool that constrains the number of paths it will attempt to follow. Because it operates at machine level, it can detect situations useful for preventing side-channel attacks where the hacker uses time differences in execution to trace secrets such the values of passwords and keys. The tool will verify that the compiler does not produce branched code that leads to those differences.

Static and dynamic methods can be used together. One application is again to reduce the number of errors thrown up by static analysis. A fuzzing tool can find the more obvious issues, which are then explored more fully using a symbolic tool such as Trustinsoft’s interpreter before the static tools are employed to find more subtle issues.

How much the recent government attention translates into enforcement remains to be seen. The EU’s proposed legislation calls for a risk analysis by developers not unlike that needed for EMC and basic product safety regulations. But to keep ahead of the hackers, it is going to pay to make sure you know what is lurking in the source code, whether you or someone else wrote it.