Previous: 4.2.1 Converting Letter Characters
Up: 4.2 Sample Character Processing Functions
Next: 4.2.3 Counting Words
Previous Page: 4.2.1 Converting Letter Characters
Next Page: 4.2.3 Counting Words

4.2.2 Converting Digit Characters to Numbers

Next we discuss how digit symbols can be converted to their numeric equivalents and vice versa. As we have stated, the character '0' is not the integer, 0, '1' is not 1, etc. So it becomes necessary to convert digit characters to their numeric equivalent values, and vice versa. As we have seen, the digit values are contiguous and increasing; the value of '0' is 48, '1' is 49, and so forth. If we subtract the base value of '0', i.e. 48, from the digit character, we can convert the digit character to its numeric equivalent; e.g. '0' - '0' is 0; '1' - '0' is 1; and so forth. Thus, if ch is a digit character, then its numeric equivalent is ch - '0'. Conversely, suppose n is a positive integer less than 10, (0, 1, 2, ..., 9). Then the corresponding digit character is n + '0'.

Using the sketch of an algorithm just described, we can write two functions that convert a digit character to its integer value, and an integer less than 10 to its character representation. These sound like operations that could be useful in a variety of programs, so we will put the functions in a file called chrutil.c. These functions are the beginning of a library of character utility functions we will build. The code is shown in Figure 4.8. (We can also place the code for uppercase() from the previous example in this file as part ot the library). We have included the file chrutil.h where the necessary macros and prototypes are located. This header file is shown in Figure 4.9.

The function dig_to_int() is given a character and returns an integer, namely the value of ch - '0' if ch is a digit character. Otherwise, it prints an error message and returns the value ERROR. Since valid integer values of digits are from 0 to 9, a value of -1 is not normally expected as a return value so we can use it to signify an error. (Note, we use a macro, in chrutil.h, to define this ``magic number''). In int_to_dig(), given an integer, n, the returned value is a digit character, n + '0', if n is between 0 and 9; otherwise, a message is printed and the NULL (ASCII value 0) character is returned to indicate an error. We do not use ERROR in this case because int_to_dig() returns a char type value, which may not allow negative values. As was the case for the function uppercase() above, in these two functions, we have not used an else part. If the condition is satisfied, a return statement is executed. The control proceeds beyond the if part only if the condition is false. Returning some error value is a good practice when writing utility functions as it makes the functions more general and robust, i.e. able to handle valid and invalid data.

Let us consider the task of reading and converting a sequence of digit characters to an equivalent integer. We might add such an operation to our library of character utilities and call it getint() (analogous to getchar()). We will assume that the input will be a sequence of digit characters, possibly preceded by white space, but not by a plus or minus sign. Further, we will assume that the conversion process will stop when a character other than a digit is read. Usually, the delimiter will be white space, but any non-digit character will also be assumed to delimit the integer being read.

The function, getint(), needs no arguments and returns an integer. It will read one character at a time and accumulate the value of the integer. Let us see how a correct integer is accumulated in a variable, n. Suppose the digits entered are '3' followed by '4'. When we read the first digit, '3', and convert it to its integer value, we find that n is the number, 3. But we do not yet know if our integer is 3, or thirty something, or three hundred something, etc. So we read the next character, and see that it is a digit character so we know our number is at least thirty something. The second digit is '4' which is converted to its integer value, 4. We cannot just add 4 to the previous value of n (3). Instead, we must add 4 to the previous value of 3 multiplied by 10 (the base - we are reading a decimal number). The new value of n is n * 10 + 4, or 34. Again, we do not know if the number being read is 34 or three hundred forty something, etc. If there were another digit entered, say '5', the new value of n is obtained by adding its contribution to the previous value of n times 10, i.e.

n * 10 + dig_to_int('5')
which is 345. Thus, if the character read, ch, is a digit character, then dig_to_int(ch) is added to the previously accumulated value of n multiplied by 10. The multiplication by 10 is required because the new digit read is the current rightmost digit with positional weight of 1; so the weight of all previous digits must be multiplied by the base, 10. For each new character, the new accumulated value is obtained by:
n = n * 10 + dig_to_int(ch);
We can write this as an algorithm as follows:
initialize n to zero
     read the first character
     repeat while the character read is a digit
          accumulate the new value of n by adding
             n * 10 + the integer value of the digit character
          read the next character
     return the result
The code for getint() is shown in Figure 4.10.

We have used conditional compilation to test our implementation by including debug statements to print the value of each digit, ch and the accumulated value of n at each step. The loop is executed as long as the character read is a digit. The macro, IS_DIGIT(), expands to an expression which evaluates to True if and only if its argument is a digit. Could we have combined the reading of the character and testing into one expression for the while?

while( IS_DIGIT(ch = getchar()))
The answer is NO! Recall, IS_DIGIT() is a macro defined as:
#define   IS_DIGIT(c)   ((c) >= '0' && (c) <= '9')
so IS_DIGIT(ch = getchar()) would expand to:
((ch = getchar()) >= '0' && (ch = getchar()) <= '9')
While this is legal syntax (no compiler error would be generated), the function getchar() would be called twice when this expression is evaluated. The first character read will be compared with '0' and the second character read will be compared with '9' and be stored in the variable ch. The lesson here is be careful how you use macros.

Notice we have used the function, dig_to_int() in the loop. This is an example of our modular approach - we have already written a function to do the conversion, so we can just use it here, trusting that it works correctly. What if dig_to_int ever returns the ERROR condition? In this case, we know that that can never happen because if we are in the body of the loop, we know that ch is a digit character from the loop condition. We are simply not making use of the full generality of dig_to_int().

If, after adding the prototype for getint() to chrutil.h:

int getint();
we compile the file chrutil.c, we would get a load time error because there is no function main() in the file. Remember, every C program must have a main(). To test our program, we can write a short driver program which simply calls getint() and prints the result:
main()
{
     printf("***Test Digit Sequence to Integer***\n\n");
     printf("Type a sequence of digits\n");
     printf("Integer = %d\n", getint()); /* print value */
}
A sample session is shown below: It is clear that something is wrong with the accumulated value of n. The first character '3' is read correctly; but the value of n is 16093. The only possibility is that n does not have a correct initial value; we have forgotten to initialize n to zero. A simple fix is to change the declaration of n in getint() to:
int n = 0;

A revised sample session is shown below.

The trace shows that the program works correctly. The value of n is accumulating correctly. It is 3 after the first character, 34 after the next, 345, after the next, and 3456 after the last character. At this point, we should test the program with other inputs until we are satisfied with the test results for all the diverse inputs. If during our testing we enter the input: we get the wrong result and no debug output. Notice, we have added some white space at the beginning of the line. In this case, the first character read is white space, not a digit. So the loop is never entered, no debug statements are executed, and the initial value of n, 0, is returned. We have forgotten to handle the case where the integer is preceded by white space. Returning to our algorithm, we must skip over white space characters after the first character is read:
initialize n to zero
     read the first character
     skip leading white space
     repeat while the character read is a digit
          accumulate the new value of n by adding
             n * 10 + the integer value of the digit character
          read the next character
     return the result
This added step can be implemented with a simple while loop:
while (IS_WHITE_SPACE(ch)) ch = getchar();
For readability, we have used a macro, IS_WHITE_SPACE(), to test ch. We can define the macro in chrutil.h as follows:
#define   IS_WHITE_SPACE(c)   ((c) == ' ' || (c) == '\t' || (c) == '\n')
Compiling and testing the program again yields the correct result.

The program may now be considered debugged, it meets the specification given in the task, so we can eliminate the definition for DEBUG and recompile the program. However, at this point we might also consider the utility and generality of our getint() function. What happens if the user does not enter digit characters? What happens at end of the file? Only after the program is tested for the ``normal'' case, should we consider these ``abnormal'' cases. The first step is to see what the function, as it is currently written, does when it encounters unexpected input.

Let's look at EOF first. If the user types end of file, getchar() will return EOF, which is not white space and is not a digit. So neither loop will be executed and getint() will return the initialized value of n, namely 0. This may seem reasonable; however, a program using this function cannot tell the difference between the user typing zero and typing end of file. Perhaps we would like getint() to indicate end of file by returning EOF as getchar() does. This is easy to add to our program; before returning n we add a statement:

if(ch == EOF) return EOF;
Of course, if the implementation defines EOF as zero, nothing has changed in the behavior of the function. On the other hand, if the implementation defines EOF as -1, we can legally enter 0 as input to the program; however, should not expect -1 as a legal value. (In our implementation we have not allowed any negative number, so EOF is a good choice for a return value at end of file).

Next, let us consider what happens if the user types a non-digit character. If the illegal character occurs after some digits have been processed, e.g.:

  • 32r
a manual trace reveals that the function will convert the number, 32, and return. If getint() is called again, the character, 'r' will have been read from the buffer so the next integer typed by the user will be read and converted. (Note, this is different than what scanf() would do under these circumstances). This is reasonable behavior for getint(), so we need make no changes to our code.

If no digits have been typed before an illegal character, e.g.:

  • r 32
again, the character, 'r' is not white space and not a digit, so getint() will return 0. As before, a program calling getint() cannot tell if the user entered zero or an error. It would be better if we return an error condition in this case, but if we return ERROR, defined in chrutil.h, we may not be able to tell the difference between this error and EOF. The best solution to this problem is to change the definition of ERROR to be -2 instead of -1. This does not affect any other functions that have used ERROR (such as dig_to_int()) since they only need a unique value to return as an error condition. We can simply change the #define in chrutil.h and recompile (see Figure 4.11). Finally, we must determine how to detect this error in getint(). As described above, we must know whether or not we have begun converting an integer when the error occurred. We can do this with a variable, called a flag, which stores the state of the program. We have called this flag got_digit (see Figure 4.12), and declare and initialize it to FALSE in getint(). If we ever execute the digit loop body, we can set got_digit to TRUE. Before returning, if got_digit is FALSE we should return ERROR, otherwise we return n.

All of these changes are shown in Figures 4.11 and 4.12. Notice we have included the header file, tfdef.h from before in the file chrutil.c to include the definitions of TRUE and FALSE. We have also modified the test driver to read integers from the input until end of file. (Only the modified versions of getint() and the test driver, main() are shown in Figure 4.12. The functions dig_to_int() and int_to_dig() remain unchanged in the file).

Our getint() function is now more general and robust (i.e. can handle errors). Of particular note here is the method we used in developing this function. We started by writing the algorithm and code to handle the normal case for input. We then considered what would happen in the abnormal case, and made adjustments to the code to handle them only when necessary. This approach to program development is good for utilities and complex programs: get the normal and easy cases working first; then modify the algorithm and code for unusual and complex cases. Sometimes this approach requires us to rewrite entire functions to handle unusual cases, but often little or no extra code is needed for these cases.



Previous: 4.2.1 Converting Letter Characters
Up: 4.2 Sample Character Processing Functions
Next: 4.2.3 Counting Words
Previous Page: 4.2.1 Converting Letter Characters
Next Page: 4.2.3 Counting Words

tep@wiliki.eng.hawaii.edu
Wed Aug 17 08:29:11 HST 1994