Previous: 8.3 Direct I/O with Files
Up: 8.3 Direct I/O with Files
Next: 8.3.2 Formatted I/O
Previous Page: 8.3 Direct I/O with Files
Next Page: 8.3.2 Formatted I/O
We have written programs for processing characters using the routines getchar() and putchar() which read or write single characters from or to the standard input or output. The standard library provides additional, more general, routines which read or write single characters from or to any file (including stdin or stdout). We will illustrate the use of these routines with two short examples.
Our next task is to read text input from a non-standard input file and compute the frequency of occurrence of each digit in the text:
FREQ: Read input from a specified text file and calculate the frequency of occurrence of each digit in the file.
Our task calls for us to read textual data from an input file. In order for the program to be able to read from a file, the file must be identified to the program. This process is called opening the file. Likewise, when our use of the data in a file is complete, the file should be closed. Opening a file informs the program where data is to be read from, and initializes a system data structure which keeps track of how far reading has progressed in the file (along with other information needed by the operating system). Most files in C programs are treated as a stream of characters by the library routines that access them, and so, an open file is sometimes also referred to as a stream. Closing a file relinquishes all use of the file from the program back to the operating system. When a file is opened, the input starts at the beginning of the file and continues until the end of file is reached. The standard files, stdin and stdout, behave the same way, but they are opened automatically at the beginning of the program. They cannot be re-opened and should not normally be closed.
We can now write an algorithm for our task of counting frequency of occurrence of digits in a file (or stream). We will use an array, digit_freq[] to store the frequency of each digit. For each character, ch read, if ch is a digit symbol, then ch - '0' is the numeric equivalent of that digit and we will use digit_freq[ch - '0'] to store the frequency of the digit. That is, digit_freq[0] will store frequency of digit character '0', digit_freq[1] will store frequency for '1', and so on. Here is the algorithm:
initialize array digit_freq[] to zero
open input file
while NOT EOF, read a character from input file stream
if a character ch is a digit
increment digit_freq[ch - '0']
print results to standard output
close input file
We begin by initializing the array, digit_freq[] to zero and
each time a digit character is
encountered, an appropriate frequency is incremented. The program
implementation is shown in Figure 8.6 and
assumes that the file to be read is named test.doc.
The input file, test.doc consists of a single line shown below:
245 87 129 45 28Sample Session:
Let us first give a summary explanation. In the declaration section of the function, main, a file pointer variable, fin, is declared to be of type FILE *. The type FILE is defined using a typedef in < stdio.h> as a special data structure containing the information about a file need to access it. After the array, digit_freq[], is initialized to zero, the file, test.doc, is opened using the standard library function, fopen():
fin = fopen("test.doc", "r");
The function, fopen(),
takes two arguments: a string which gives the name of the physical file,
and a second
string which specifies the mode ( "r"
(for read) indicates an input file). If the
file can be opened, fopen() returns a file pointer
which can be used to access
the corresponding stream. If the file cannot be opened, fopen()
returns a NULL pointer,
so the program tests if the returned value of the file pointer,
fin, is NULL and, if so, terminates the
program after a message is printed.
If the file opened (i.e. fin is not NULL),
then fin can be thought of as a ``handle'' on the file which
is passed to an appropriate I/O routine to access the data.
In our case, a character is read from the
stream using the standard library function, getc():
ch = getc(fin)The function, getc(), reads a character from the stream accessed by the file pointer, fin. It returns the value of character read if successful, and EOF otherwise. In the program, each character read is examined to see if it is a digit; if it is, the count for that digit is incremented. Once the end of input file is reached, the file is closed with the statement:
fclose(fin);Finally, the program prints the results accumulated in the array.
Let us now examine some details. When a file is opened, it is associated with a file buffer that serves as the interface between the physical file and the program. A program reads or writes a stream of characters from or to a file buffer. A file stream (buffer) pointer must be maintained to mark the next position in the file buffer. This information is stored in the data structure, of type FILE, pointed to by the file pointer. Once a physical file is opened, i.e. associated with a file buffer, and a file pointer is initialized, a program uses only the file pointer.
The derived data type, FILE, is defined in < stdio.h> using a typedef statement, and contains information about a file, such as the location of a file buffer, the current position in the buffer, file mode (read, write, append), whether errors have occurred, and whether an end of file has occurred. Users need not know the details of this data structure, instead, it is used to define pointer variables to a FILE type data item to be accessed by the library functions. For example,
FILE *fin, *fout;declares two file pointer variables, fin and fout. It is now possible to associate these FILE pointers with desired physical files. We use the terms stream and file pointer interchangeably with FILE pointer. Standard files are always open and standard file pointer variables are available to all programs. They are named stdin, stdout, and stderr.
The process of opening a file connects a physical file and associates a mode with the FILE pointer. The mode specifies whether a file is opened for input, for output, or for both. The file open function, fopen(), associates a physical file with a file buffer or stream and returns a FILE pointer that is used to access the file. Here is the prototype:
FILE * fopen(char * fname, char * mode);The mode string, "r", specifies that the file is to be opened for reading (i.e. an input file), "w" specifies writing mode (i.e. an output file), and "a" specifies append mode (i.e. both an input and an output file). If the file was opened successfully, fopen() returns a pointer that will access the file stream. If it was not possible to open the file for some reason, fopen() returns a NULL pointer (a pointer whose value is zero - in C, the zero address is guaranteed to be an invalid address). It is the programmer's responsibility to check to see if the returned pointer is NULL. The most common reason why a file cannot be opened for reading is that it does not exist, i.e. an erroneous file name has been used.
Once a file is opened, the library function, getc(), reads single characters from the file stream. The argument passed to getc() must be a file pointer, and it returns the (integer) value of a character read or EOF if an end of file is reached.
Files should be closed after their use is completed. Failure to close open files may destroy files if a program terminates prematurely. The library function that closes a file is fclose(), whose argument must be a FILE pointer. The process of closing a file frees the file buffer.
In the above program, we specified the name of the input file in the code itself. If the program is to be used with any other input file, we would have to modify the program and recompile. Instead, a flexible program should ask the user to enter file names as needed.
Our next task is to copy one file to another. The algorithm is: simple.
get input and output file names
open files for input and output
while NOT EOF, read a character ch from input stream
write ch to output stream
close files
The library routine, putc(ch, output) writes a character, ch,
to a file stream, output. The program is shown in Figure 8.7.
Sample Session:
The file routines, getc() and putc() can be used with standard files as well. We just use the predefined file pointers for the standard files:
ch = getc(stdin);
putc(ch, stdout);
The above programs terminate if an attempt to open a file is unsuccessful.
As an improvement to these programs,
friendly programs should allow the user to rectify possible errors in entering
file names.