Previous: 8.3 Direct I/O with Files
Up: 8.3 Direct I/O with Files
Next: 8.3.2 Formatted I/O
Previous Page: 8.3 Direct I/O with Files
Next Page: 8.3.2 Formatted I/O

8.3.1 Character I/O

We have written programs for processing characters using the routines getchar() and putchar() which read or write single characters from or to the standard input or output. The standard library provides additional, more general, routines which read or write single characters from or to any file (including stdin or stdout). We will illustrate the use of these routines with two short examples.

Our next task is to read text input from a non-standard input file and compute the frequency of occurrence of each digit in the text:

FREQ: Read input from a specified text file and calculate the frequency of occurrence of each digit in the file.

Our task calls for us to read textual data from an input file. In order for the program to be able to read from a file, the file must be identified to the program. This process is called opening the file. Likewise, when our use of the data in a file is complete, the file should be closed. Opening a file informs the program where data is to be read from, and initializes a system data structure which keeps track of how far reading has progressed in the file (along with other information needed by the operating system). Most files in C programs are treated as a stream of characters by the library routines that access them, and so, an open file is sometimes also referred to as a stream. Closing a file relinquishes all use of the file from the program back to the operating system. When a file is opened, the input starts at the beginning of the file and continues until the end of file is reached. The standard files, stdin and stdout, behave the same way, but they are opened automatically at the beginning of the program. They cannot be re-opened and should not normally be closed.

We can now write an algorithm for our task of counting frequency of occurrence of digits in a file (or stream). We will use an array, digit_freq[] to store the frequency of each digit. For each character, ch read, if ch is a digit symbol, then ch - '0' is the numeric equivalent of that digit and we will use digit_freq[ch - '0'] to store the frequency of the digit. That is, digit_freq[0] will store frequency of digit character '0', digit_freq[1] will store frequency for '1', and so on. Here is the algorithm:

initialize array digit_freq[] to zero
    open input file
    while NOT EOF, read a character from input file stream
         if a character ch is a digit
              increment digit_freq[ch - '0']
    print results to standard output
    close input file
We begin by initializing the array, digit_freq[] to zero and each time a digit character is encountered, an appropriate frequency is incremented. The program implementation is shown in Figure 8.6 and assumes that the file to be read is named test.doc.

The input file, test.doc consists of a single line shown below:

245  87  129  45  28
Sample Session:

Let us first give a summary explanation. In the declaration section of the function, main, a file pointer variable, fin, is declared to be of type FILE *. The type FILE is defined using a typedef in < stdio.h> as a special data structure containing the information about a file need to access it. After the array, digit_freq[], is initialized to zero, the file, test.doc, is opened using the standard library function, fopen():

fin = fopen("test.doc", "r");
The function, fopen(), takes two arguments: a string which gives the name of the physical file, and a second string which specifies the mode ( "r" (for read) indicates an input file). If the file can be opened, fopen() returns a file pointer which can be used to access the corresponding stream. If the file cannot be opened, fopen() returns a NULL pointer, so the program tests if the returned value of the file pointer, fin, is NULL and, if so, terminates the program after a message is printed. If the file opened (i.e. fin is not NULL), then fin can be thought of as a ``handle'' on the file which is passed to an appropriate I/O routine to access the data. In our case, a character is read from the stream using the standard library function, getc():
ch = getc(fin)
The function, getc(), reads a character from the stream accessed by the file pointer, fin. It returns the value of character read if successful, and EOF otherwise. In the program, each character read is examined to see if it is a digit; if it is, the count for that digit is incremented. Once the end of input file is reached, the file is closed with the statement:
fclose(fin);
Finally, the program prints the results accumulated in the array.

Let us now examine some details. When a file is opened, it is associated with a file buffer that serves as the interface between the physical file and the program. A program reads or writes a stream of characters from or to a file buffer. A file stream (buffer) pointer must be maintained to mark the next position in the file buffer. This information is stored in the data structure, of type FILE, pointed to by the file pointer. Once a physical file is opened, i.e. associated with a file buffer, and a file pointer is initialized, a program uses only the file pointer.

The derived data type, FILE, is defined in < stdio.h> using a typedef statement, and contains information about a file, such as the location of a file buffer, the current position in the buffer, file mode (read, write, append), whether errors have occurred, and whether an end of file has occurred. Users need not know the details of this data structure, instead, it is used to define pointer variables to a FILE type data item to be accessed by the library functions. For example,

FILE *fin, *fout;
declares two file pointer variables, fin and fout. It is now possible to associate these FILE pointers with desired physical files. We use the terms stream and file pointer interchangeably with FILE pointer. Standard files are always open and standard file pointer variables are available to all programs. They are named stdin, stdout, and stderr.

The process of opening a file connects a physical file and associates a mode with the FILE pointer. The mode specifies whether a file is opened for input, for output, or for both. The file open function, fopen(), associates a physical file with a file buffer or stream and returns a FILE pointer that is used to access the file. Here is the prototype:

FILE * fopen(char * fname, char * mode);
The mode string, "r", specifies that the file is to be opened for reading (i.e. an input file), "w" specifies writing mode (i.e. an output file), and "a" specifies append mode (i.e. both an input and an output file). If the file was opened successfully, fopen() returns a pointer that will access the file stream. If it was not possible to open the file for some reason, fopen() returns a NULL pointer (a pointer whose value is zero - in C, the zero address is guaranteed to be an invalid address). It is the programmer's responsibility to check to see if the returned pointer is NULL. The most common reason why a file cannot be opened for reading is that it does not exist, i.e. an erroneous file name has been used.

Once a file is opened, the library function, getc(), reads single characters from the file stream. The argument passed to getc() must be a file pointer, and it returns the (integer) value of a character read or EOF if an end of file is reached.

Files should be closed after their use is completed. Failure to close open files may destroy files if a program terminates prematurely. The library function that closes a file is fclose(), whose argument must be a FILE pointer. The process of closing a file frees the file buffer.

In the above program, we specified the name of the input file in the code itself. If the program is to be used with any other input file, we would have to modify the program and recompile. Instead, a flexible program should ask the user to enter file names as needed.

Our next task is to copy one file to another. The algorithm is: simple.

get input and output file names
    open files for input and output
    while NOT EOF, read a character ch from input stream
         write ch to output stream
    close files
The library routine, putc(ch, output) writes a character, ch, to a file stream, output. The program is shown in Figure 8.7.

Sample Session:

The program first reads the input and output file names. We use scanf() to read the file names into strings, infile and outfile. These files are then opened for input and output, respectively. If either of the files cannot be opened, an error message is printed and the program is terminated by an exit() call. If both files are opened successfully, the copying is done in a loop until EOF. The loop reads a character from input into ch which is then written to the stream indicated by outfile using putc(). When EOF is reached, the files are closed and a message printed.

The file routines, getc() and putc() can be used with standard files as well. We just use the predefined file pointers for the standard files:

ch = getc(stdin);
    putc(ch, stdout);
The above programs terminate if an attempt to open a file is unsuccessful. As an improvement to these programs, friendly programs should allow the user to rectify possible errors in entering file names.



Previous: 8.3 Direct I/O with Files
Up: 8.3 Direct I/O with Files
Next: 8.3.2 Formatted I/O
Previous Page: 8.3 Direct I/O with Files
Next Page: 8.3.2 Formatted I/O

tep@wiliki.eng.hawaii.edu
Wed Aug 17 09:15:23 HST 1994