Previous: 1.2.2 Main Memory
Up: 1.2 Representing Data and Program Internally
Previous Page: 1.2.2 Main Memory
As has been mentioned, in addition to data being stored in memory, the program to be executed is also stored there in the form of a sequence of instructions. It is the CPU shown in Figure 1.1 that is responsible for fetching instructions, one at a time, from memory and performing the specified operation on data. A more detailed picture of the CPU with its memory is shown in Figure 1.3. Within the CPU are several key components; the ALU, a set of Registers, and a Control Unit.
The ALU (Arithmetic Logic Unit) is a digital circuit which is designed to perform arithmetic (add, subtract) operations as well as logic (AND, OR) operations on data. The registers in the CPU are a small scratchpad memory to temporarily store data while it is in use. The Control Unit is another circuit which determines what operation is being requested by an instruction and controls the other circuitry to carry out that operation; i.e. the Control Unit directs all operations within the machine.
Also shown in the figure are the connections between the CPU and Memory. They consist of an address bus, as mentioned in the previous Section, and a data bus, over which all information (data and program) passes between the CPU and Memory.
This Section describes how programs are stored in the machine as a sequence of instructions coded in binary. Such an encoding is called the machine language of the computer and is described below.
The basic operations that the CPU is capable of performing are usually quite simple and the set of these operations provided on a particular computer is called the instruction set. Within this set are instructions which can move data from one place to another, for example from memory to a CPU register; an operation called load. Similarly there are store instructions for moving data from the CPU to a location in memory. In addition there are instructions directing arithmetic operations, such as add, on data values. There are also instructions which control the flow of the program; i.e. that determine from where in memory the next instruction should be fetched. Normally instructions are fetched sequentially -- the next instruction is fetch from the next memory address; however, these control instructions may test a condition and direct that the next instruction be fetched from somewhere else in memory instead. Finally, there may also be instructions in the set for ``housekeeping'' operations within the machine, such as controlling external I/O devices.
To encode these instructions in binary form for storage in memory, some convention must be adopted to describe the meaning of the bits in the instruction. Most of the instructions described above require at least 2 pieces of information -- a specification of what particular instruction this is, called the opcode or operation code, and the address of the data item on which to operate. These parts can be seen in Figure 1.3 in the block labeled instruction.
Instructions coded in binary form are called machine language instructions and the collection of these instructions that make up a program is called a machine language program. Such a program is very difficult for a person to understand or to write. Just imagine thinking in terms of binary codes for very low level instructions and in terms of binary memory addresses for data items. It is not practical to do so except for very trivial programs. Humans require a higher level of programming languages that are more adapted to our way of thinking and communicating. Therefore, at a level a little higher than machine language, is a programming language called assembly language which is very close to machine language. Each assembly instruction translates to one machine language instruction. The main advantage is that the instructions and memory cells are not in binary form; they have names. Assembly instructions include operational codes, (i.e., mnemonic or memory aiding names for instructions), and they may also include addresses of data. An example of a very simple program fragment for the machine described above is shown in Figure 1.4. The figure shows the machine language code and its corresponding assembly language code. Definitions of memory cells are shown below the program fragment.
The machine language code is shown in binary. It consists of 8 bits of opcode and 16 bits of address for each instruction. From the assembly language code it is a little easier to see what this program does. The first instruction loads the data stored in memory at a location known as ``Y'' into the CPU register (for CPU's with only one register, this is often called the accumulator). The second instruction adds the data stored in memory at location ``X'' to the data in the accumulator, and stores the sum back in the accumulator. Finally, the value in the accumulator is stored back to memory at location ``Y''. With the data values shown in memory in the figure, at the end of this program fragment, the location known as ``Y'' will contain the value 48.
A utility program is provided to translate the assembly language code (arguably) readable by people into the machine language code readable by the CPU. This program is called the assembler. The program in the assembly language or any other higher language is called the source program, whereas the program assembled into machine language is called the object program. The terms source code and object code are also used to refer to source and object programs.
Assembly language is a decided improvement over programming in machine language, however, we are still stuck with having to manipulate data in very simple steps such as load, store, add, etc., which can be a tedious, error prone process. Fortunately for us, programming languages at higher levels still, languages closer to the way we think about programming, have been developed along with translators (called compilers) for converting to object programs. One such language is C, which is the subject of this text and is introduced in the next Section.