C secure for some time
6 mins read
C was never meant to take over the world, yet it has done more than any other high-level language to displace assembly language as programmer's tool of choice. The bad news is that C's success arguably owes a lot to machine code in the first place.
C was originally developed to make it easier to port Unix implementations to different processor architectures. As only the processor specific portions of code needed to be in assembler, C could be used for everything else. Its features for implementing low-level detail were ideal for coding operating-system (OS) data structures and routines.
Unix, meanwhile, was not expected to be as successful as it has turned out to be. Its name was a pun on the ultimately unwanted OS Multics and reflected its leaner construction. Today, most OSs in widespread use can point to Unix as their forebear. Multics' only legacy is the ring-based security structure of Intel's x86 processors, which has largely given way to other schemes.
C is going a bit too strong for many computer scientists, who feel it is one of the reasons for the atrocious productivity of software developers. With C comes low-level control and, with that, control responsibility, which is not always exercised responsibly.
Many languages try to make it more difficult for the developer to bring a machine to its knees. Some, like Java, put code in a sandbox, limiting direct access to vulnerable data structures. C's pointer arithmetic lets the programmer direct writes to just about any memory address to which the memory-management unit – if there is one – allows access. If you use a null pointer without checking its destination is undefined: anything could happen. On a virtual-memory machine, this usually means a page fault, but it could easily be the stack of a running thread.
Through its 'casting' feature, programmers can change variable types arbitrarily. Sometimes, this process is benign; sometimes, it leads to unexpected results, especially if a function was written such that the compiler performed the cast implicitly – and silently – before triggering an error or a runtime failure.
Despite its problems, C has seen off challenges from other languages, although Java has made some inroads. As a language for use in embedded systems, C – and its object-oriented successor C++ – remain supreme, despite their reliability issues.
One option is to extend C with desirable, safety-enhancing features from other languages, but this has had mixed success. For example, Microsoft convinced a large number of desktop application programmers to move to C# on the basis that programs developed using C# were less prone to the errors commonly found in C++ code.
For example, global variables – which can potentially be altered by any functions in a program – cannot be used in C#. Programmers can only use pointers to memory addresses in blocks marked specifically as 'unsafe' and cannot cast variable types from one to another easily. Conversion between data types has to be carried out explicitly.
However, it is possible to guard against the more dangerous features of C through the use of coding standards. This is particularly common in safety-critical systems.
Automotive software developers formed the Motor Industry Software Reliability Association (MISRA), creating a set of guidelines to remove the most common sources of errors found in C programs and the costly rework that goes with them. The full set of restrictions in MISRA C is stringent – you cannot allocate memory dynamically in a fully compliant program, for example.
There are less severe restrictions available that can enhance software reliability. In the 1980s, Bertrand Meyer developed the concept of programming by contract and incorporated it into his Eiffel language. The idea is that, all too often, programmers pass nonsensical data to functions and then spend hours poring over debug traces trying to work out why the program failed. The function that passed the bad data might not fail itself, but simply respond with more garbage that trips up the calling function later on.
Design by contract is all about giving functions clearly defined interfaces and then enforcing them in the software. Developers using Eiffel would insert into the header the variable ranges their functions would accept and the compiler would generate the necessary runtime checks. However, it is possible to simply use coding standards that demand programmers put pre- and post-conditions into their functions, with static tools such as eCv used to check for their presence.
C has its biggest inherent problems with concurrency: it assumes sequential operation and the mechanisms used to define concurrency, such as multi-threading, can be clunky and error-prone. C's ability to share any memory anywhere is a huge weakness when it comes to exploiting concurrency. Threads have to communicate explicitly when they want to make any changes to an object in shared memory – multi-threaded development is often fraught with difficulty because it is easy to make false assumptions about when threads can run.
Computer scientists have proposed concurrency extensions to C and some environments – such as the programming framework for picoChip's arrays – combine C with other languages to try to ease the problem, with the alternate language defining the connections with the C functions used to define the processor's sequential operations.
C's concurrency problems convinced some engineers to move away from the language completely. Ericsson Radio Systems decided, in the late 1980s, that it needed a better language to program its telecom switches, which were expected to employ a large number of processors in parallel. A research team developed Erlang in response.
Erlang was designed to work as a concurrent language and, in doing so, restricts what it is possible to share. Broadly speaking, Erlang is based on the communicating sequential processes (CSP) model developed by Tony Hoare and used in Inmos' Occam language.
Erlang avoids a lot of the lock overhead of C or C++ by not allowing threads to share memory: they have to send data to each other. Threads are also designed to have such a low overhead that it becomes feasible to allocate a thread to each little job. For example, in the case of an SMS message switch, a dedicated thread handles each incoming message, avoiding the need for a thread to explicitly manage its own message queues.
Error checking is also handled outside the threads themselves. If one thread crashes, it simply exits and sends a message to the controlling process, which can then decide what action to take – which may be to spawning a new thread to take over the work. If a thread is restarted too many times, the calling thread can terminate itself, alerting a higher-level thread to a larger potential problem and stopping that part of the system from getting stuck in an infinite loop. Erlang users believe this improves robustness by concentrating the error handling.
Erlang is intended to cope with systems where there can be thousands or even millions of threads running in parallel on one system.
The fact that Erlang is a functional language distinguishes it from procedural or imperative languages, such as C. Rather than progressing procedurally through a set of state changes, functional languages treat all operations as the application of mathematical functions.
Unlike C, where procedures can be used to edit source data directly, the source data in a functional language can never be changed. Instead, functions take the source data and produce a new output that can be used in the next function. So, where a C programmer might use an iterative loop, a functional programmer will use recursion. Compilers will usually recognise when recursion in the source code can be turned into iteration in the assembly code they generate. Similarly, the compiler decides how a function should be implemented, rather than leaving the order of operations to the programmer, as with C and other imperative languages.
By removing the need for programmers to decide when and how things need to happen, advocates of functional languages claim software line counts can be reduced by up to a factor of ten. A study by Ericsson found productivity improvements of 9 to 24 times with Erlang, yet it banned the language for a while, favouring non-proprietary alternatives, even though Erlang was open source.
Erlang is far from being the only functional language, although, by virtue of its heritage, it's the one most used for developing embedded systems. It has also seen an increase in use in recent years in web applications because of its concurrency support.
Erlang's main competition comes from Haskell, called a lazy functional language because if the compiled software sees that a defined operation does not need to be completed, it will stop executing that portion of the code.
Productivity improvements come at a cost. Haskell programs may have to allocate large amounts of memory to carry out operations, whereas a skilled C programmer can, by carefully ordering statements, reduce overhead massively. Speed is also unpredictable, although a Haskell compiler is currently rated as being about half as fast as C on common benchmarks in the Great Language Shootout, but five times faster than Erlang and only narrowly behind C#. These are not good attributes for memory-limited embedded applications. However, a mixture of C and Haskell or Erlang could, potentially, speed development without too much runtime cost by passing complex, less timing-sensitive management code to the functional domain and reserving C for time-critical functions.
However, the Haskell FAQ (haskell.org) reveals the Achilles heel of its approach to programming: 'Haskell is very different from traditional mainstream languages, it's like learning programming anew'.
This is not good news in an environment where time for retraining is rarely available. Despite the productivity improvements promised by other languages, C and its extensions are likely to stay with us for a long time, even if other dialects nibble at the edges.
Erlang code example
This Erlang code fragment shows how a factorial function might be defined, demonstrating the use of recursion. The clause 'when' allows statements to be reordered by the compiler. The '/' after factorial in the -export statement defines how many operands the function takes, in this case 1:
-module(factorialCalc).
-export([factorial/1]).
factorial(0) -> 1;
factorial(N) when N > 0 ->
N * factorial(N - 1).