Previous: 1.2 Representing Data and Program Internally
Up: 1.2 Representing Data and Program Internally
Next: 1.2.2 Main Memory
Previous Page: 1.2 Representing Data and Program Internally
Standard methods for representing commonly used numeric and non-numeric data have been developed and are widely used. While a knowledge of internal binary representation is not required for programming in C, an understanding of internal data representation is certainly helpful.
As mentioned above, all data, including programs, in a computer system is represented in terms of groups of binary digits. A single bit can represent one of two values, 0 or 1. A group of two bits can be used to represent one of four values:
00 - 0 01 - 1 10 - 2 11 - 3If we have only four symbols to represent, we can make a one-to-one correspondence between the patterns and the symbols, i.e., one and only one symbol is associated with each binary pattern. For example, the numbers 0, 1, 2, and 3 are mapped to the patterns above.
Such a correspondence is called a code and the process of representing a symbol by the corresponding binary pattern is called coding or encoding. Three binary digits can be used to represent eight possible distinct values using the patterns:
000 100 001 101 010 110 011 111A group of
We will first present a standard code for natural numbers, i.e., unsigned integers 0, 1, 2, 3, 4, etc. There are several ways to represent these numbers as groups of bits, the most natural way is analogous to the method we use to represent decimal numbers. Recall, a decimal (or base 10) representation uses exactly ten digit symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Any decimal number is represented using a weighted positional notation.
For example, a single digit number, say 9, represents just nine, because the weight of the rightmost position is 1. A two digit number, say 39, represents thirty plus nine. The rightmost digit has a weight 1, and the next digit to the left has a weight of 10. So, we multiply 3 by 10, and add 9 multiplied by 1. Thus, for decimal notation the weights for the digits starting from the rightmost digit and moving to the left are 1, 10, 100, and so on, as shown below.
Thus,
The positional weights are the powers of the base value 10, with the rightmost
position having the weight of , the next positions to the left having in
succession the weight of
,
,
, and so on.
Such an expression is commonly written as a sum of the contribution of each
digit, starting with the lowest order digit and working toward the largest
weight;
that is, as sums of contributions of digits
starting from the rightmost position and working toward the left.
Thus, if is an integer written in decimal form with digits
:
then represents the sum:
where is the total number of digits, and
is the
digit from the
rightmost position in the decimal number.
Binary representation of numbers is written in exactly the same manner.
The base is 2, and a number is written using just two digits symbols, 0 and 1.
The positional weights, starting from the right are now powers of the base 2.
The weight for the rightmost digit is ,
the next digit has the weight of
,
the next digit has the weight of
, and so on. Thus, the weights
for the first ten positions from the right are as follows:
A natural binary number is written using these weights.
For example, the binary number
represents the number whose decimal equivalent is
and the binary number
represents the number whose decimal equivalent is
When a binary number is stored in a computer word with a fixed number of
bits, unused bits to the left (leading bits) are set to 0.
For example, with a 16 bit word, the binary equivalent of 168 is
We have
shown the bits in groups of four to make it easier to read.
In general, if is an integer written in binary form with digits
:
then its decimal equivalent is:
As we said,
a word size of bits can represent
distinct patterns.
We use these patterns to represent the unsigned
integers from 0 to
.
For example, 4 bits
have 16 distinct patterns representing the equivalent decimal unsigned
integers 0 to 15,
8 bits for decimal numbers 0 through 255, and so
forth.
Given this representation, we can perform operations on these unsigned integers. Addition of two binary numbers is straightforward. The following examples illustrate additions of two single bit binary numbers.
0 0 1 1 +0 +1 +0 +1 --- --- --- --- 0 1 1 10
The last addition, , results in a sum digit of 0
and a carry forward of 1.
Similarly, we can add two arbitrary binary numbers, b1 and b2:
011100 (carry forward )b1 101110 (base 10 value: 46) +b2 +001011 (base 10 value: 11) --- -------- sum 111001 (base 10 value: 57)
We have seen how, given a binary representation of a number, we can determine the decimal equivalent. We would also like to go the other way; given a decimal number, find the corresponding binary pit pattern representing this number. In general, there are two approaches; one generates the bits from the most significant (the leftmost bit) to the least significant; the other begins with the rightmost bit and proceeds to the leftmost.
In the first case,
to convert a decimal number, , to a binary number, determine the
highest power,
, of 2 that can be subtracted from
:
and place
a 1 in the
binary digit position. The process is repeated
for the remainder
, and so forth until the remainder is zero.
All other binary
digit positions have a zero. For example, consider a decimal number 103.
The largest power of 2 less than 103 is 64 (
):
So we get:
weights 128 64 32 16 8 4 2 1 1 1 1 1 1which, for an 8 bit representation give:
In the alternate method, we divide by 2,
using integer division (discarding any fractional part),
and the remainder is the next binary digit moving from least significant
to most.
In the example below, the % operation is called mod and is the
remainder from integer division.
Reading the bits top to bottom filling right to left, the number is
The binary representation discussed above is a standard code for storing unsigned integer numbers. However, most computer applications use signed integers as well; i.e. integers that may be either positive or negative. There are several methods used for representing signed numbers.
The first, and most obvious, is to represent signed numbers as we do
in decimal, with an indicator for the sign followed by the magnitude
of the number as an unsigned quantity.
For example, we write:
In binary we can use one bit within a representation
(usually the most significant or leading bit)
to indicate either
positive (0) or negative (1),
and store the unsigned binary representation of the magnitude in the remaining
bits.
So for an 16 bit word,
we can represent the above numbers as:
However; for reasons of ease of design of circuits to do arithmetic on signed binary numbers (e.g. addition and subtraction), a more common representation scheme is used called two's complement. In this scheme, positive numbers are represented in binary, the same as for unsigned numbers. On the other hand, a negative number is represented by taking the binary representation of the magnitude, complementing all bits (changing 0's to 1's and 1's to 0's), and adding 1 to the result.
Let us examine the 2's complement representation of and
using 16
bits.
For
, the result is the same as for unsigned numbers:
For
, we begin with the unsigned representation of 100:
complement each bit:
and add 1 to the above to get 2's complement:
This operation is reversible, that is, the magnitude (or absolute value)
of a two's
complement representation of a negative number can be obtained with the
same procedure; complement all bits:
and add 1:
In a two's complement representation,
we can still use the most significant bit to determine the sign
of the number; 0 for positive, and 1 for negative.
Let us determine the decimal value of a negative 2's complement number:
This is a negative integer since the leading bit is 1, so to find its
magnitude we complement all bits:.
and add 1:
The decimal magnitude is
, and the sign is negative, so, the original
integer represents decimal
.
In determining the range of integers that can be represented by bits,
we must
allow for the sign bit. Only
bits are available for positive integers,
and the range for them is 0 through
.
The range of negative integers
representable by
bits is -1 through
.
Thus, the range of integers
representable by
bits is
through
.
For example, for 8 bits, the range of signed integers is
through
, or
to
.
It can be seen from the above analysis that,
due to a finite number of bits used to represent numbers,
there are limits to the largest and/or smallest numbers that can be represented
in the computer.
We will discuss this further in Chapter .
One important thing to keep in mind at this point is that we have been discussing different representations for numbers. Whether a number is expressed in binary, e.g. 010011, or decimal, 19, it is still the same number, namely nineteen. It is simply more convenient for people to think in decimal and for the computer to use binary. However, converting the computer binary representation to the human decimal notation is somewhat tedious, but at the same time writing long strings of bits is also inconvenient and error prone. So two other representation schemes are commonly used in working with binary representations. These schemes are call octal and hexadecimal (sometimes called hex) representations and are simply positional number systems using base 8 and 16, respectively.
In general, an unsigned integer, , consisting of
digits
written as:
in any base is interpreted as the sum:
If the base is 2 (binary), the symbols which may be used for the digits
are [0, 1].
If the base is 10 (decimal) the digit symbols are
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9].
Likewise, for base 8 (octal) the digit symbols are [0, 1, 2, 3, 4, 5, 6, 7];
and for hexadecimal (base 16) they are
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f].
The letter symbols, [a, b, c, d, e, f] (upper case [A, B, C, D, E, F]
may also be used)
give us the required 16 symbols and
correspond to decimal values [10, 11, 12, 13, 14, 15] respectively.
Using the above sum, it should be clear that the following are representations
for the same number:
For hexadecimal numbers, the positional weights are, starting from the right, 1, 16, 256, etc. Here are a few examples of converting hex to decimal:
Similarly, octal numbers are base 8 numbers with weights 1, 8, 64, etc. The following are some examples of converting octal to decimal:
The reason octal and hex are so common in programming is the ease of converting between these representations and binary, and vice versa. For hexadecimal numbers, exactly four bits are needed to represent the symbols 0 through F. Thus, segmenting any binary number into 4 bit groups starting from the right, and converting each group to its hexadecimal equivalent gives the hexadecimal representation.
As a side effect, conversion from binary to decimal is much easier by first converting to hex and then to decimal, as shown above.
Similarly, segmenting a binary number into three bit groups starting from the right gives us the octal representation. Thus, the same number can be expressed in octal as:
Conversion of base 8 or base 16 numbers to binary is very simple; for each digit, its binary representation is written down. Conversion between octal and hex is easiest done by converting to binary first:
Some additional examples of equivalent hexadecimal, octal, binary, and decimal numbers are shown in Table 1.1 In a programming language we need a way to distinguish numbers written in different bases (base 8, 16, 10, or 2). In C source programs, a simple convention is used to write constants in different bases. Decimal numbers are written without leading zeros. Octal numbers are written with a leading zero, e.g. 0250 is octal 250. Hexadecimal numbers are written with a leading zero followed by an x or X, followed by the hexadecimal digits. Thus, 0xA8 will mean hexadecimal A8. (Binary numbers are not used in source programs).
So far we have discussed representations of integers, signed and unsigned; however, many applications make use of other types of data in their processing. In addition, some applications using integers require numbers larger than can be stored in the available number of bits. To address these problems, another representation scheme, called floating point is used. This scheme allows representation of numbers with fractional parts (real numbers) as well as numbers that may be very large or very small.
Representation of floating point numbers is analogous to decimal scientific notation. For example:
By adjusting the decimal place, as in the last case above,
a number of this form consists of just three parts:
a fractional part called the mantissa, a base, and an exponent.
Over the years,
several schemes have been devised for representing each of these parts
and storing them as bits in a computer.
However,
in recent years a standard has been defined by the Institute
for Electrical and Electronics Engineers (IEEE Standard 754)
which is gaining in acceptance and use in many computers.
A detailed description of these schemes,
and their relative tradeoffs,
is beyond the scope of this text;
however, as programmers, it is sufficient that we realize that
we can express floating point numbers either in decimal with
a fractional part:
or using exponential form:
where E or e refers to exponent of the base (10 in this case).
As with integers,
due to the fixed number of bits used in the representation,
there are limits to the range (largest and smallest numbers)
and the precision (number of digits of accuracy) for the numbers
represented.
Another widely used data type is character data
which is non-numeric
and includes all the symbols
normally used to represent textual material such as letters, digits
and punctuation. Chapter
discusses the
representation of character data in detail,
however, the principle is the same;
some pattern of bits is selected to represent each symbol to be stored.
These are the principle data types provided by programming languages, but as we will see in future Chapters, languages also provide a means for the programmer to define their own data types and storage schemes.