Previous: 7.1.1 Declaring Arrays
Up: A Compound Data Type --- array
Previous Page: 7.1.1 Declaring Arrays
Next Page: 7.2 Passing Arrays to Functions

7.1.2 Character Strings as Arrays

Our next task is to store and print non-numeric text data, i.e. a sequence of characters which are called strings. A string is an list (or string) of characters stored contiguously with a marker to indicate the end of the string. Let us consider the task:

STRING0: Read and store a string of characters and print it out.

Since the characters of a string are stored contiguously, we can easily implement a string by using an array of characters if we keep track of the number of elements stored in the array. However, common operations on strings include breaking them up into parts (called substrings), joining them together to create new strings, replacing parts of them with other strings, etc. There must be some way of detecting the size of a current valid string stored in an array of characters.

In C, a string of characters is stored in successive elements of a character array and terminated by the NULL character. For example, the string "Hello" is stored in a character array, msg[], as follows:

char msg[SIZE];

msg[0] = 'H'; msg[1] = 'e'; msg[2] = 'l'; msg[3] = 'l'; msg[4] = 'o'; msg[5] = '\0';

The NULL character is written using the escape sequence '0'. The ASCII value of NULL is 0, and NULL is defined as a macro to be 0 in stdio.h; so programs can use the symbol, NULL, in expressions if the header file is included. The remaining elements in the array after the NULL may have any garbage values. When the string is retrieved, it will be retrieved starting at index 0 and succeeding characters are obtained by incrementing the index until the first NULL character is reached signaling the end of the string. Figure 7.3 shows a string as it is stored in memory. Note, string constants, such as
"Hello"
are automatically terminated by NULL by the compiler.

Given this implementation of strings in C, the algorithm to implement our task is now easily written. We will assume that a string input is a sequence of characters terminated by a newline character. (The newline character is not part of the string). Here is the algorithm:

initialize index to zero
    while not a newline character
         read and store a character in the array at the next index
         increment the index value
    terminate the string of characters in the array with a NULL char.
    initialize index to zero
    traverse the array until a NULL character is reached
         print the array character at index
         increment the index value

The program implementation has:

The code is shown in Figure 7.4 and a sample session form the program is shown below.

Sample Session:

The first while loop reads a character into ch and checks if it is a newline, which discarded and the loop terminated. Otherwise, the character is stored in msg[i] and the array index, i, incremented. When the loop terminates, a NULL character is appended to the string of characters. In this program, we have assumed that the size of msg[] is large enough to store the string. Since a line on a terminal is 80 characters wide and since we have defined SIZE to be 100, this seems a safe assumption.

The next while loop in the program traverses the string and prints each character until a NULL character is reached. Note, we do not need to keep a count of the number of characters stored in the array in this program since the first NULL character encountered indicates the end of the string. In our program, when the first NULL is reached we terminate the string output with a newline.

The assignment expression in the above program:

msg[i] = '\0';
can also be written as:
msg[i] = NULL;
or:
msg[i] = 0;
In the first case, the character whose ASCII value is 0 is assigned to ; where in the other cases, a zero value is assigned to msg[i]. The above assignment expressions are identical. The first expression makes it clear that a null character is assigned to msg[i], but the second uses a symbolic constant which is easier to read and understand.

To accommodate the terminating NULL character, the size of an array that houses a string must be at least one greater than the expected maximum size of string. Since different strings may be stored in an array at different times, the first NULL character in the array delimits a valid strin. The importance of the NULL character to signal the end of a valid string is obvious. If there were no NULL character inserted after the valid string, the loop traversal would continue to print values interpreted as characters, possibly beyond the array boundary until it fortuitously found a (0) character.

The second while loop may also be written:

while (msg[i] != NULL)
         putchar(msg[i++]);
and the while condition further simplified as:
while (msg[i])
         putchar(msg[i++]);
If msg[i] is any character with a non-zero ASCII value, the while expression evaluates to True. If msg[i] is the NULL character, its value is zero and thus False. The last form of the while condition is the more common usage. While we have used the increment operator in the putchar() argument, it may also be used separately for clarity:
while (msg[i]) {
         putchar(msg[i]);
         i++;
     }

It is possible for a string to be empty; that is, a string may have no characters in it. An empty string is a character array with the NULL character in the zeroth index position, msg[0].



Previous: 7.1.1 Declaring Arrays
Up: A Compound Data Type --- array
Previous Page: 7.1.1 Declaring Arrays
Next Page: 7.2 Passing Arrays to Functions

tep@wiliki.eng.hawaii.edu
Wed Aug 17 08:56:22 HST 1994