Datatypes based on 8bit chunks cut cost, but what about analysis?
6 mins read
Prior to the launch by IBM of its System/360, there was no consensus on what should be the basic word-length of a computer.
According to Werner Buchholz – credited with coining the word 'byte' with a 'y' to prevent accidental mutation to 'bit' – the plan was to make what is now a basic unit of computing much more flexible. During the development of Stretch – System/360's predecessor – a byte could be any packet of bits from one to eight. The Stretch architecture made it possible to address memory down to the bit level.
But Stretch was expensive and IBM needed a cheaper way to build a mainframe. One of the casualties was the programmable byte: the engineers settled on 8bits as the best compromise. Since then, we have constructed data types on a byte-by-byte basis, not because they were the best options, but because they offered the greatest memory efficiency.
From that basic word, we obtained the core datatypes of C, such as the short, int and long, together with a not entirely consistent interpretation of how they should be implemented. Typically, an int is the natural word-length of a processor: for example, 4bytes on a 32bit machine. However, this is not generally the case on an 8bit microcontroller, as an int with a maximum range of 256 values has limited usefulness.
Instead, an int will often be defined to be 2bytes long. This can lead to performance problems if int variables are used indiscriminately in a program, because 2byte operations are often much slower than native 8bit instructions. For loop counters and values known to have a small range, it often makes sense to declare them as char, but not short, variables. Somewhat confusingly, short and int datatypes are generally both 2byte.
A long is twice the size of an int. The C99 standard also has the concept of the long long: as you would expect, it has double the bytes of a long. Because of the ease with which you can implement addition and subtraction for signed numbers, two's complement is the method generally used to encode negative numerals. Generated by inverting each bit position and then adding one, the addition of a pair of two's complement variables provides the same bit pattern as the addition of two unsigned numbers with equivalent bit patterns.
For example 1 plus -2 provides the value -1 in two's complement, or 255 in unsigned form, if you assume the use of byte-wide variables. The data in an int does not have to be an integer. The radix point can be anywhere inside the variable – the programmer just needs to understand where it is and scale values as needed. The two's complement scheme still works for these formats.
Despite its efficiency, the widespread use of fixed-point two's complement representations followed some years after the development and use of its main numeric counterpart: floating-point. It was not until System/360 arrived that two's complement began to spread across the industry.
More than 20 years earlier, in the late 1930s in Berlin, Konrad Zuse built the Z1, a binary mechanical computer that used floating-point arithmetic, a format that splits the datatype into three parts to allow a much wider range of numeric values to be stored, albeit with less overall precision – at very high and low values, the gap between numbers becomes somewhat larger than a equivalent, very long fixed-point representation.
Of the three parts of a floating-point number, the first – typically a single bit – holds the sign bit. The second is reserved for the exponent. In today's machines, a single-precision, 32bit floating-point number – a float in C – stores the exponent as an unsigned 8bit value. There also has to be a way of representing negative exponents. Instead of using a value based on complements, the number is biased.
In today's floating-point formats, the exponent has to have 127 subtracted from the stored byte to come up with the right value. So, a value of 129 gives you an exponent of two. As a result, positive exponents have a leading 1, rather than negative numbers in two's-complement form. The remaining 23bit of a single-precision number is used for the fraction, or mantissa, the number to be multiplied by the exponent to give the actual number.
But there is a further complication: numbers are normalised, so they are always represented as 1.f, where f is the fraction. This makes it possible to leave out the leading one and, in effect, store 24bit of data using a 23bit value. Normalisation introduces an overhead, although this is only apparent in software-emulation libraries, as hardware can perform the search for the leading bit very quickly.
Because of normalisation, it is also necessary to compare exponents as well as mantissas to ensure the right result, even when the two values are close to each other. Although floating-point numbers are stored differently to fixed-point, many processors will convert them on the fly so they can use two's complement arithmetic to process them. Andrew Donald Booth developed an efficient way of multiplying numbers represented by two's complement in 1951 (see fig 1).
Since then, many presentations at the International Solid State Circuits Conference and other seminars have covered high-speed floating-point units that use variants of Booth's algorithm. For years, computer makers developed and used their own floating-point representations, which made it difficult to port code between architectures, particularly as certain combinations of mantissa and exponent are often used as special numbers, such as infinity, and error codes to flag up potential problems.
In the mid-1980s, while developing the 8087 coprocessor, Intel proposed the idea of developing a standard for floating-point numbers. This turned into IEEE754, which is now used by practically all microprocessor-based implementations. The standard defines several precision classes, such as the 32bit single-precision and the 64bit double-precision formats, as well as infinity, positive and negative zero and the not-a-number (NaN) code.
The NaN can be used by programs to catch errors using a simple trap handler in place of more extensive checkup code. In IEEE754, special numbers sit at either end of the exponent scale. For example, in a single-precision float, the maximum exponent is limited to +127 – using a bias value of +127 – so that 128 can be used to encode infinity and NaN codes.
At the other end, -127 (encoded as zero in the exponent field) is used to denote zeroes and numbers that do not have normalised mantissa values. These subnormal or denormalised numbers – often called 'denormals' by programmers – help to handle underflow. Without denormals, you cannot represent a number smaller than 1.0 x 2127. Any result smaller than that would be converted to zero during the normalisation stage used to keep the significand in its 1.f form.
Gradual underflow loosens this restriction and makes it possible to represent significands in the form 0.f when the exponent is zero. There is a catch with gradual underflow: most hardware floating-point units do not support it directly. Instead, they trap to much slower software handlers. Even in architectures with hardware support, gradual underflow can be slow. Very often, denormals will be seen during software profiling as sudden dips in performance.
As a result, they are very unwelcome in most embedded-control and signal-processing situations. Because high-level software tools, such as Matlab and Simulink, generate code intended to run on IEEE754 compliant hardware, embedded coprocessors [generally] implement practically all of the standard. However, NaN and underflow should only arise as errors in control algorithms, so it makes sense to perform bounds checks on values in the loop, rather than risk a trap to underflow handling that may disrupt the system's performance.
Another trick is to use a 'denormal killer': a constant that, when added and subtracted from the mantissa will, because of its limited resolution, leave it set to zero if it has underflowed. Custom processors provide the opportunity to streamline the formats if gates are scarce, even offering scope for breaking with byte-oriented datatypes. Because of its relatively poor density – circuits often take up 20 times more die space than hardwired implementations – programmable logic is a good candidate for custom datatypes.
Low-energy processors will also benefit from less overhead in numeric processing. At Imperial College, Dr George Constantinides has been working on numerical analysis to focus high precision only where it is needed. Large savings in die space can be made by operating in the fixed-point space and tuning the word width at each point in the algorithm to use only as many bits as are necessary.
The downside is this puts the onus of numerical analysis – working out exactly how much precision is needed – on the system developer. Given the trend is towards automatically generated code – and floating-point formats are easier to deal with as they do not need the radix point to be adjusted manually in the way that fixed-point formats do – only systems that need high die or energy efficiency will go down this road.
However, more automated analysis of numeric formats should be achievable. Dr Constantinides has used linear system theory to analyse the impact of datapath error as noise and calculate the error that is injected by quantising to different levels of precision. An alternative is to move into different number systems, such as logarithms.
Code that needs a lot of sequential multiplies will benefit from a move into the log space, although errors introduced by converting from linear to logs have to be taken into account. Despite the long history of standard, byte-oriented formats, there is still plenty of space to be explored in numeric formats, particularly in embedded sensor-based systems that have to take energy and die cost into account, rather than the design considerations of a 50-year-old mainframe architecture.