The team reached its conclusion during research in which vulnerabilities were introduced into software using a technique called LAVA – Large Scale Automated Vulnerability Addition. The automated system inserts known quantities of novel vulnerabilities that are synthetic, yet which possess many of the same attributes as computer bugs ‘in the wild’.
Brendan Dolan-Gavitt, an assistant professor at NYU Tandon, said the efficacy of bug finding programs is based on two metrics: the false positive rate and the false negative rate, both of which are notoriously difficult to calculate. It is not unusual for a program to detect a bug that later proves not to be there – a false positive – and to miss vulnerabilities that are actually present – a false negative. Without knowing the total number of bugs, there is no way to gauge how well these tools perform. “The only way to evaluate a bug finder is to control the number of bugs in a program, which is exactly what we do with LAVA,” he said.
Existing bug-finding software was tested and the results showed that just 2% of bugs created by LAVA were detected.
Now, the team plans to launch an open competition under which developers and other researchers can request a LAVA bugged version of a piece of software, attempt to find the bugs and receive a score based on their accuracy.
“There has never been a performance benchmark at this scale in this area and now we have one,” Dolan-Gavitt said. “Developers can compete for bragging rights on who has the highest success rate in bug-finding and the programs that will come out of the process could be stronger.”