Previous: 4 Processing Character Data
Up: 4 Processing Character Data
Next: 4.2 Sample Character Processing Functions
Previous Page: 4 Processing Character Data

4.1 A New Data Type: char

The complete set of characters that can be recognized by the computer is called the character set of the machine. As with numbers, the representation in the computer of each character in the set is done by assigning a unique bit pattern to each character. The typical character set consists of the following types of characters:

Alphabetic lower case: 'a',..., 'z'
     Alphabetic upper case: 'A',..., 'Z'
     Digit symbols        : '0',..., '9'
     Punctuation          : '.', ',', ';', etc.
     Space                : ' '
     Special symbols      : '@', '#', '$', etc.
     Control Characters   : newline, tab, bell or beep, etc.
For example, a digit symbol is character type data, so when we type 234 at the keyboard, we are typing a sequence of character symbols: '2', followed by '3', followed by '4'. The function scanf() takes this sequence and converts it to the internal form of the equivalent number, 234. Similarly, all writing on the screen is a sequence of characters so printf() takes the internal form of the number and converts it to a sequence of characters which are written onto the screen.

In C programs, variables may be declared to hold a single character data item by using the keyword char as the type specifier in the declaration statment:

char ch;
A character constant is written surrounded by single quotation marks, e.g. 'a', 'A', '$', '!', etc. Only printable character constants can be written in single quotes, not control characters, so writing of non-printable control character constants requires special handling. In C, the backslash character, , is used as an escape character which signifies something special or different from the ordinary and is followed by one character to indicate the particular control character. We have already seen one such control sequence in our printf() statments; the newline character, 'n'. Other frequently used control character constants written with an escape sequence, include 't' for tab, 'a' for bell, etc. Table 4.1 shows the escape sequences used in C. The newline, tab, and space characters are called white space characters, for obvious reasons.

Let us consider a simple task of reading characters typed at the keyboard and writing them to the screen. The task is to copy (or echo) the characters from the input to the output. We will continue this task until there is no more input, i.e. until the end of the input file.

TASK

COPY0: Write out each character as it is read until the end of input file.

The algorithm can be stated simply as:

read the first character
     while there are more characters to read
          write or print the previously read character;
          read the next character
The code for this program is shown in Figure 4.1.

The keyword char declares a variable, ch, of character data type. We also declare an integer variable, flag, to save the value returned by scanf(). Recall, the value returned is either the number of items read by scanf() or the value EOF defined in stdio.h. (We do not need to know the actual value of EOF to use it).

After the title is printed, a character is read by the statement:

flag = scanf("%c", &ch);
The conversion specification for character type data is %c, so this scanf() reads a single character from the input. If it is not an end of file keystroke, the character read is stored into ch, and the value returned by scanf(), 1, is saved in flag. As long as the value of flag is not EOF, the loop is entered. The loop body first prints the value of ch, i.e. the last character read, and then, the assignment statement reads a new character and updates flag. The loop terminates when flag is EOF, i.e. when an end of file keystroke is detected. Remember, scanf() does not store the value, EOF into the object, ch. DO NOT TEST THE VALUE OF ch FOR EOF, TEST flag. A sample session is shown below:

The sample session shows that as entire lines of characters are entered; they are printed. Each character typed is not immediately printed, since no input is received by the program until a newline character is typed by the user; i.e. the same buffering we saw for numeric data entry. When a newline is typed, the entire sequence of characters, including the newline, is placed in the keyboard buffer and scanf() then reads input from the buffer, one character at a time, up to and including the newline. In our loop, each character read is then printed. When the buffer is exhausted, the next line is placed in the buffer and read, and so on. So, scanf() is behaving just as it did for numeric data; each call reads one data item, in this case a character ( %c). One notable difference between reading numeric data and character data is that when scanf() reads a character, leading white space characters are read, one character at a time, not skipped over as it is when reading numeric data.



Previous: 4 Processing Character Data
Up: 4 Processing Character Data
Next: 4.2 Sample Character Processing Functions
Previous Page: 4 Processing Character Data

tep@wiliki.eng.hawaii.edu
Wed Aug 17 08:29:11 HST 1994