"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."
Professor C. A. R. Hoare
Developing reliable and secure software has become a challenging task, mainly because of the unmanageable complexity of the software systems we build today. Software flaws have many causes, but our observations show that they mostly come from two broad sources: i) design, such as a malicious or unintentional backdoor; and ii) implementation, such as a buffer overflow.
To address these problems, our research group at Defence Research and Development Canada (DRDC) Valcartier first worked on design issues. A prototype of a UML design verifier was built. Our approach was successful, but we faced two difficulties: i) specifying interesting security properties at the design level; and ii) scalability of the verification process.
Building on this experience, we studied design patterns for the implementation of security mechanisms. The output was a security design pattern catalog, available from the authors, which can help software architects choose mature and proven designs instead of constantly trying to reinvent the wheel.
This paper addresses the implementation issues from our evaluation of currently available automatic source code verifiers that search for program sanity and security bugs. From this evaluation, it becomes clear that the choice of programming language to use when starting an open source project can have many important consequences on security, maintainability, reliability, speed of development, and collaboration. As a corollary, software quality is largely dependent on the adequacy of the programming language with respect to the desired properties of the system developed. Therefore, the adoption of open source software (OSS) should consider the programming language that was used.
Context & Terminology
The assurance level required for executing applications depends on their execution context. Our context is military, in which confidential data is processed by sensitive applications running on widespread operating systems, such as Windows and Linux, and mostly programmed in C/C++ and Java. Our primary goal was to get rid of common security problems using automated source code verification tools for C++ and Java. To do so, we first investigated errors and vulnerabilities emerging from software defects. This allowed us to create meaningful tests in order to evaluate the detection performance and usability of these tools.
In our investigation of common software security problems, we observed that most do not come from the failure of security mechanisms. Rather, they occur from failures at a lower level, which we call program sanity problems. Security mechanisms ensure high level properties, such as confidentiality, integrity, and availability, and are mostly related to design. Access control frameworks, intrusion prevention systems, and firewalls are all examples of security mechanisms. Program sanity problems are related to protected memory, valid control and data flow, and correct management of resources like memory, files, and network connections.
Because these problems are many-sorted, a terminology is necessary to classify them. An error is closely related to the execution of a program and occurs when the behavior of a program diverges from "what it should be"; that is, from its specification. A defect lies in the code and is a set of program instructions that causes an error. A defect can also be the lack of something, such as the lack of data validation. Finally, a vulnerability is a defect that causes an error that can be voluntarily triggered by a malicious user to corrupt program execution.
We focused on errors, defects, and vulnerabilities that can have an impact on security. To be as general as possible, we wanted them to be application-independent. We defined five errors, twenty-five kinds of defects across six categories, and three vulnerabilities, as shown in Figure 1.
Figure 1: Errors, Defects, and Vulnerabilities
The list of possible low-level errors that can happen when a program is executed is very long. Since we had no interest in the correctness of computations with respect to specifications, we focused on general errors that can interfere with correct memory management, control flow, and resource allocation. Types of low-level errors include:
- memory write out of bounds: a valid region of memory is overwritten which results in serious vulnerabilities since it can allow an attacker to modify the program state
- memory read out of bounds: a region of invalid memory is read, resulting mostly in errors in computations, but sensitive values could be read
- resource leak: a discardable resource such as memory, a file handle, or a network connection is not returned to the available pool which will generally lead to a slowdown or crash of the resource-starved program
- program hang: the program is in an infinite loop or wait state, which generally leads to a denial of service
- program crash: an unrecoverable error condition happens and the execution of the program is stopped, leading to a denial of service
Most defects will not always generate errors for every execution of the program. Complex conditions have to be met for the error to happen and input values play an important role. Furthermore, many defects are composite and cannot be attributed to only one program instruction. The following is a list of the type of defects we used to create our tests:
- memory management faults: problems related to memory allocation, deallocation, and copy from one buffer to another
- overrun and underrun faults: problems related to the overrun or underrun of an array or a C++ iterator
- pointer faults: problems related to incorrect pointer usage
- cast faults: problems related to the incorrect cast of one type into another
- miscellaneous faults: problems that do not fit into any other category
Errors in general are undesirable, but the real problem is vulnerabilities, especially remotely-exploitable ones. We observed that almost all dangerous vulnerabilities are associated with memory reads or writes out of bounds. Vulnerabilities can be classified as:
- denial of service: allows an attacker to prevent user access to an appropriate service
- unauthorized access: allows an attacker to access functionalities or data without the required authorization
- arbitrary code execution: allows an attacker to take control of a process by redirecting its execution to a given instruction
Problems with C/C++ Programs
Many defects and errors are possible because of bad design choices when the C and C++ programming languages were created. These languages require micro-management of the program's behaviour through memory management, are error-prone due to pointer arithmetic, and induce serious consequences to seemingly benign errors such as buffer overflows. A short list of the major C/C++ design shortcomings follows:
- lack of type safety: type-safe programs are fail-fast as their execution is stopped immediately when an error occurs whereas non type-safe languages like C/C++ let the execution of erratic programs continue
- pointer arithmetic: allows a programmer to change the value of a pointer without restriction, making it possible to read and write anywhere in the process memory space, as well as making program verification more difficult
- static buffers: buffers in C/C++ cannot grow to accommodate data, buffer accesses are not checked for bounds, and overflows can overwrite memory
- lack of robust string type: C has no native type for character strings, meaning static buffers with potential overflow problems are used instead; while C++ programs can use a string type, our observations show that this is rarely the case
Creators of modern languages, such as Java, had these problems in mind and addressed them. Indeed, Java is immune to C/C++ program sanity problems because runtime checks throw an exception if an error occurs. However, many program sanity checks throw unchecked exceptions and these are rarely caught by programmers. Many problems become denial-of-service vulnerabilities since uncaught exceptions crash the program.
We evaluated 27 tools for C/C++ and 37 for Java. All these tools were categorized into three families: i) program conformance checkers; ii) runtime testers; and iii) advanced static analyzers.
Program conformance checkers perform a lightweight analysis based on syntax to find common defects. Because of this unsophisticated analysis, they perform poorly, except for a few defects that can be detected by simple syntax analysis. Many free tools were in this category.
Runtime testers look for errors while the program is running by instrumenting the code with various checks. This provides a fine-grained analysis with excellent scalability that can be very helpful when the program's behaviour cannot be computed statically because of values that are not known before runtime.
Advanced static analyzers work on program semantics instead of syntax. They generally use formal methods, such as abstract interpretation or model-checking, which often lead to scalability problems. The code must be compiled into a model and this is usually complex with C/C++ because of code portability problems between compilers.
Our results can be summarized as:
- for C/C++, commercial tools are by far the best
- for Java, there are many good free tools
- since Java is immune to most program sanity problems that plague C/C++, there are no exact equivalents to C/C++ tools
- the focus of Java tools is on good practices and high level design problems
Since our goal was to detect program sanity problems, we focused on tools for C/C++ during our evaluation. For our evaluation, our criteria were: i) precision in flaws detected vs. false positives; ii) scalability from small to large programs; iii) coverage or the inspection of every possible execution; iv) and the quality of the diagnostic report in its usefulness for problem correction.
Preliminary tests showed that only three tools for C/C++ had the potential to help us achieve our goal: Coverity Prevent and PolySpace for C++ for detecting defects, and Parasoft Insure++ for detecting errors. We tested these tools in two ways: i) over real code in production that, to the best of our knowledge, worked well but was a bit buggy; and ii) over many small ad-hoc pieces of code (synthetic tests) containing specific programming defects. To compare these tools, all results had to be converted to errors or defects. For synthetic tests, defects and the errors they caused were known in advance so it was easy to convert everything to defects. However, for code in production, nothing was known in advance, so we decided to use the best result as a baseline. Since Insure++ was the best performer, all results were converted to errors.
The complete results of our synthetic tests are available in the original paper. The difficulties in testing C/C++ programs can be summarized as follows:
- no tool is able to detect every kind of defect or error
- static analysis tools need good quality code to perform well
- pointer arithmetic used to read from and write to complex data structures renders static analysis extremely difficult
- makefiles are often show-stoppers due to their lack of granularity, their number makes debugging a tedious task, and they are often complex and require many dependencies
- compiler-specific extensions to C/C++ make the parsing of non-standard extensions difficult
- the use of conditional compilation using preprocessor directives which come from a mix of environment variables, configuration files, and make parameters adds to the complexity of the verification process
- header files are often created or moved by the makefile while it is running
- there are often many different header files with the same name, but at different locations
We found that having the verification tool parse the program correctly is the most difficult part of the job, and this is often a show-stopper unless one has unlimited time. Java is not problematic because it has no preprocessor and no conditional compilation. It has been designed to be standard and portable.
Tool Limitations and Best Usage Scenario
We found that current static verification tools suffer from what we have called the "black box problem". Indeed, for reactive applications and heterogeneous systems, execution does not always take place in available application code. For instance, in reaction to a mouse click, a reactive application can start executing in kernel code to pass the event over and around the operating system. This part of its execution can rarely be analyzed and, therefore, static analysis tools can hardly determine what type of data comes out of these calls. This prevents true interprocedural analysis.
Scalability is also a problem for static tools that have to consider (and abstract) all possible executions.
Dynamic tools have the opposite problem: they are very scalable but provide poor coverage with poor test cases. However, if you consider the number of tests needed to cover all possible executions with dynamic tools, scalability is still a problem.
The best usage scenario for Coverity Prevent is when the whole application needs to be analyzed and it is compiled using a working makefile. The application code size can be over 500K lines of C++ without problems. Coverity has many good points: i) very good integration with makefiles; ii) uses the Edison compiler front-end that can read code containing compiler-specific extensions from almost every big compiler in the industry; iii) very scalable; iv) excellent diagnostics with execution traces that are easy to understand and very helpful to correct problems; and iv) uses an innovative, but proprietary, analysis based on statistical code analysis and heuristics.
The best usage scenario for PolySpace for C++ is to analyze small segments of critical code in applications where runtime exceptions should never happen. The application code size must stay under 20K lines of C++. It uses a very thorough analysis based on abstract interpretation, with which it can detect runtime errors statically. It has a nice graphical interface, especially the viewer module which is used to analyze the report and navigate the source code. However, it lacks a good diagnostic because sometimes it is impossible to understand the defect found.
The best usage scenario for Parasoft Insure++ is to test hybrid systems based on many heterogeneous components. To consider code coverage, it should always be integrated into test case harnesses that have been shown to provide good code coverage. Since Insure++ is a dynamic tool, there is no limit to the application code size and bad quality code has no effect on detection performance. Insure++ has a very good diagnostic with call stack and memory diagrams that show exactly what was overwritten. However, as already mentioned, test cases have to be carefully specified with a good coverage strategy.
We have alluded to the importance of simple and unambiguously specified language constructs, standardized, portable, and type-checked language compilation, vigilant runtime monitoring, and available verification tools. We argue that it is simpler, though not simple, to produce better quality software with modern programming languages. We believe that modern programming languages should always be used over older ones, except when a convincing argument can be made against it.
Furthermore, programmers should use the verification tools that are available for their programming languages and should stay aware of the new ones. In the selection of open source products, the programming language used is, of course, not the only variable to consider in assessing software quality. But when evaluating two products that have been properly tested for appropriate and correct functionality for the task at hand, we would recommend to choose the one programmed with a modern language.
The computer industry tends to adopt new technologies very quickly. Setting human and financial resources aside, the adoption of new programming languages generally follows the laws of fashion: what is the new best thing this year? what should I use to stay cool and up-to-date? This is not necessarily a bad driver of progress. However, it covers a pernicious habit: we have rarely observed a programmer adopting a new programming language because he knew all the pitfalls of his current language and wanted to avoid them.
The root of security problems are not the failure of security mechanisms. C/C++ programs are especially problematic because they enforce almost no restriction on the execution of programs and they are prone to vulnerabilities with serious consequences. However, modern languages, such as Java, are immune to C/C++ problems and are not prone to any serious vulnerability. Of course, just as with any language, design must be rigorously verified and implemented correctly. The use of Java is not a panacea and care should still be taken in the correct implementation of security mechanisms.
Verifying C/C++ programs is a huge challenge. These languages are very difficult to analyze because of many undefined or non-standard semantics, pointer arithmetic, and compiler-specific extensions to the language. We have found no currently available verification tool that can reduce the risk significantly enough for sensitive applications. We highly recommend the use of modern programming languages such as Java, which nullify program sanity problems. However, if the use of C/C++ is mandatory, we recommend restricting its usage and the use of serious test cases and verification tools.
This article is based upon a paper originally published in the Proceedings of the Static Analysis Summit.