"In programming, as in everything else, to be in error is to be reborn."
Alan J. Perlis, first recipient of Turing Award
On May 20, 2008, static analysis tool vendor Coverity released a report entitled "Open Source Report 2008". The report includes information gathered over the first two years of the Coverity Scan project which was developed as part of a contract from the US Department of Homeland Security. Coverity provides its analysis tools to open source projects in order to identify quality and security flaws in the codebases. Once identified, the developers of the open source projects are given the information in order to facilitate hardening of the software.
The report includes information about the progress made by various projects using the Scan service. Additionally, the Scan databases constitute one of the largest and most diverse collections of source code to be built and analyzed while tracking changes to those code bases over a two-year period. This data provides a substantial set of samples for considering some questions about the nature of software. The report investigates relationships between codebase size, defect counts, defect density, function lengths, and code complexity metrics. This article highlights some of the results from the report.
Data Used in the Report
Software has become a larger part of our lives over the last few decades. Whether on a desktop computer, or in systems we use like bank machines and automobiles, there are few people left who don't interact with software on a daily basis. Flaws in software can lead systems to misbehave in ways that range from simply annoying to life-threatening. Yet, although software plays such a ubiquitous and critical role in daily life, there are still many unanswered questions about how to develop good software and how to measure the quality of software.
Coverity is a software vendor that develops tools to automatically identify flaws in source code. While normally sold to commercial software developers, the US Department of Homeland Security contracted Coverity to analyze open source software (OSS) and provide the results to open source developers so that they could fix the defects that were identified.
Coverity's "Open Source Report 2008", includes a sampling of some of the data collected since the launch of the project in March of 2006. The information in the report falls into a number of different categories. There is data about the degree of improvement and regression in quality by the open source projects using the Scan site. There is data about the frequency of different types of defects identified by the analysis and information about the consequences of each type of defect. There are statistical correlations between various measurements of the software projects that are being tracked and statistics about the proportion of defects where the developers determined that the analysis tool was incorrect when it claimed there was a defect.
The data in the report is based on open source projects which add up to 55 million lines of code. In over 14,000 build and analysis sessions over two years, almost 10 billion lines of code were run through the analysis engine. In addition to looking for defects, the analysis retains information about the code itself, such as the names and numbers of functions and their lengths, the files that comprise the various projects, and the calculated complexity metrics. Commercial software is not usually available for analysis on an ongoing basis such as that performed in the Scan project. Commercial developers often do not release their source code, and when they do, it is typically in the form of a specific release version which usually receives a thorough vetting before its public release.
In contrast, open source projects make their source code available in a public version control system. Anyone who is interested can track the changes that the developers make day by day. This provides a visibility into the software developer process that would not exist without open source principles.
When a large number of projects are viewed together, the result is a sample set of data that can begin to answer many questions about the nature of software. Knowing the answers to these initial questions allows us to begin to formulate more sophisticated questions for future efforts to pursue.
Defect densities are measured in number of defects per 1,000 lines of code. Over the two years from March 2006 to March 2008, the average defect density in the projects being monitored dropped from 0.30 defects per thousand lines of code to 0.25, or from roughly 1 defect per 3,333 lines of code to one defect per 4,000 lines of code. This represents an overall improvement of 16%. Figure 1 shows the change in defect density.
Figure 1: Change in Defect Density
A statistical correlation was performed to compare defect densities and average function length in each project. The confidence factor found was 1.49%, where confidence factors can range from 0% to 100%. Data included in the report show no correlation between defect density and average function length. Since best practices often stipulate that long functions should be re-factored into multiple smaller functions, support for that practice would be demonstrated if the data showed a correlation of higher defect densities with longer average function lengths. While there may be advantages to re-factoring to assist in code maintainability, shorter functions do not seem to guarantee improvements in defect density. Figure 2 shows the relationship between defect density and function length.
Figure 2: Static Analysis Defect Density and Function Length
An additional correlation was performed between codebase size in lines of code, and number of defects identified. A commonly repeated statement is that software complexity grows as project sizes grow, but to an exponential extent. That is, it is often asserted that adding additional code to a project adds complexity due to all of the interactions with the existing code.
The codebase size and defect count correlation was 71.9%, which indicates that the increase in defect count is largely linear with the growth in number of lines of code. If the increase in complexity leads to more defects, and more frequent defects, then a linear correlation ought to be much lower than the nearly 72% figure. This appears to indicate that writing 1,000 additional lines of code to be added to a large codebase (say, over 1 million lines) is no more difficult or error prone than writing 1,000 lines of code to be added to a small codebase.
This finding has the potential benefit of alleviating a concern about software development. It has been speculated that software applications will become so large that they will become literally unmanageable. While there may be other aspects and limitations to the management of large projects, there does not appear to be an upper limit on project size causing defects to be created at an unmanageable rate.
Comparisons are made in the report between codebase size and the calculated complexity metrics for each codebase. Cyclomatic complexity is an algorithm for measuring the number of independent paths through a piece of source code. The total cyclomatic complexity of an application was found to correlate almost 92% to the number of lines of code in an application. This implies that calculating the complexity metric for a codebase may tell you more about how much code you have than about its complexity. Figure 3 shows the correlation between complexity and lines of code.
Figure 3: Cyclomatic Complexity and Lines of Code
Since the complexity metric is so strongly related to codebase size, it may be important to double-check one's assumptions about the meaning of complexity metrics and determine whether the way in which they are being used is appropriate, given the information they convey.
When discussing the results of the report, there is a common desire to draw comparisons between open source code quality and commercial source code quality. While this issue is addressed in the report, it is not answered. The lack of availability of a wide sample set of commercial source code may make it impossible to ever perform an analysis similar to that done for open source code in the released document.
The report also includes information about the rate of false positives identified in the analysis by developers looking at the results for their codebases. The false positive rate is an important metric for any static analysis tool, because some portion of the analysis is constrained by information that will only be available at runtime. Any tool performing static analysis will identify some issues that cannot happen at runtime, and will fail to identify some issues that can. The critical factor is the degree to which a tool identifies code defects that are valuable to resolve, while not reporting so many false issues that developers become frustrated with the inaccuracy of the tool. To date, developers have identified only 13.32% of the results in the Scan project as false positives.
Finally, the report concludes with appendices covering the specific statistical methods applied to the data, and additional details about the consequences of the various types of defects identified in the code that was examined.
It is expected that feedback from readers of the report will drive deeper investigations into the available data, which may uncover further interesting trends in the nature of software development.
Coverity intends to issue updated reports on an annual basis, comparing year-over-year trends, and overall progress by the projects involved in the Scan. As updated tools are made available to the open source developers, the results will include new defect types and changes in the overall distribution of defect type frequencies.