The text and programs here are from the first edition of CGI Programming 101. This has been replaced by the 2nd edition; please click here to view the updated material from the 2nd edition.

Chapter 6: Reading and Writing Data Files


As you start to program more advanced CGI applications, you'll want to store data so you can use it later. Maybe you have a guestbook program and want to keep a log of the names and email addresses of visitors, or a counter program that must update a counter file, or a program that scans a flat-file database and draws info from it to generate a page. You can do this by reading and writing data files (often called file I/O).

Most web servers run with very limited permissions; this protects the server (and the system it's running on) from malicious attacks by users or web visitors. Unfortunately this means it's harder to write to files; when your CGI is run, it's run with the server's permissions, and it's likely the server doesn't have permission to create files in your directory. In order to write to a data file, you must usually make it world-writable, via the chmod command:

This sets the permissions so that all users can read from and write to the file. (If you want the file to be readable by you, but write-only by all other users, do "chmod 622 filename".) See Appendix A for a chart of the various chmod permissions.

The bad part about this is, it means that anyone on your system can go in and change your data file, or even delete it, and there's not much you can do about it.

Some alternatives are CGIwrap and Apache's suEXEC; both of these force CGIs on the web server to run under the CGI owner's userid and permissions. Apache also allows the webmaster to define what user and group the server - including virtual hosts - runs under. If your site is a virtual host, ask your webmaster to set your server up under a group that only you are a member of. Then you can chmod your files one of these ways:

This is safer than having a world-writable file. Ask your webmaster how you can secure your data files.

Permissions are less of a problem if you only want to read a file; just set the file group- and world-readable, and your CGIs can safely read from that file.

Opening Files

Reading and writing files is done by opening a filehandle, with the statement:

The filename may be prefixed with a ">", which means to overwrite anything that's in the file now, or with a ">>", which means to append to the bottom of the existing file. If both > and >> are omitted, the file is opened for reading only. Here are some examples:

The filehandles in these cases are INF and OUTF. You can use just about any name for the filehandle, but for readability, it's always good to name it something relevant.

Also, a warning: your web server might do strange things with the path your CGI runs under, so it's possible you'll have to use the full path to the file (such as "/home/you/public_html/somedata.txt"), rather than just the filename. This is generally not the case with the Apache web server, but some other servers behave differently. Try opening files with just the filename first (provided the file is in the same directory as your CGI), and if it doesn't work, then use the full path.

One problem with the above code is that it doesn't check to ensure the file was really opened. The safe way to open a file is as follows:

This uses the "dienice" subroutine we wrote in chapter 4 to display an error message and exit the CGI if the file can't be opened. You should do this for all file opens, because if you don't, the CGI will continue running even if the file isn't open, and you could end up losing data. It can be quite frustrating to realize you've had a survey running for several weeks while no data was being saved to the output file.

The $! in the above dienice message is a Perl variable that stores the error code returned by the failed open. Printing it out may help you figure out why the open failed.

Let's test it out, by modifying our survey.cgi from chapter 5 to write to a data file. Edit survey.cgi as follows:

Source code: http://www.cgi101.com/class/ch6/survey.txt

Next you'll need to create the output file and make it writable, because your CGI probably can't create new files in your directory (unless you made the entire directory writable by the server - but that's usually a bad idea, since it means anyone can delete any file in that directory, or add new files). Go into the Unix shell, change to the directory where your CGI is located, and type the following:

The Unix touch command, in this case, creates a new, empty file called survey.out. Then the chmod makes it writable by everyone.

Now go back to your browser and fill out the survey form again. If your CGI runs without any errors, you'll get data added to the survey.out file. The resulting file should look something like this:

This is what's called a flat-file database - a text file containing data, with each line of the file being a new record (or one set of results from a form) in the database. In this example, we've separated the fields with the pipe symbol (vertical bar "|"), though you could use any character that will not appear in the data itself.

Notice a few new things in the above code. First, the following lines:

strip out all carriage returns and other strange end-of-line characters from the incoming data. Since we want each line of data in the output file to represent one record, we don't want extraneous carriage returns messing things up. Also you'll notice we've done several "print" statements to the output file, but it only resulted in a single line of data printed to the output file. This is because a line of output isn't really ended until you print out the "\n" character. So you can do:

and the resulting output will be

Since the \n only appears after "Blee".

File Locking

CGI processes on a Unix web server can run simultaneously, and if two scripts try to open and write the same file at the same time, the file may be erased, and you'll lose all of your data. To prevent this, we've used flock(OUTF,2) in the survey.cgi to exclusively lock the survey file while we are writing to it. (The 2 means exclusive lock.) The lock will be released when your script finishes running, allowing the next CGI to access the file. This is only effective if all of the CGIs that read and write to that file also use flock; without flock, the CGI will ignore the locks of any other process and

open/write/erase the file. Since flock may force the CGI to wait for another CGI to finish writing to a file, you should also reset the file pointer, using the seek function:

offset is the number of lines to move the pointer, relative to whence, which is one of the following:

So, a seek(OUTF,0,2) ensures that you start writing at the very end of the file. If you were reading the file instead of writing to it, you'd want to do a seek(OUTF,0,0) to reset the pointer to the beginning of the file.

Note that flock is not supported on all systems (definitely not on Windows), so if you get an error in your script due to the flock, just comment it out. Of course, without the lock, you risk losing data; you can either accept that risk, or look at Chapter 18 and find out how to write your data to a database instead of a file.

Closing Files

When you're finished writing to a file, it's best to close the file, like so:

Files are automatically closed when your script ends, as well.

Reading Files

After you've run a survey or poll like our previous example, you'll want to summarize the data. All that's involved is opening your data file, reading every record, and doing whatever calculations or summarizations you want to do on it.

There are two ways you can handle reading data from a file: you can either read one line at a time, or read the entire file into an array. Here's an example:

If you were to use this code in your program, you'd end up with the first line of survey.out being stored in $a, and the remainder of the file in array @b (with each element of @b containing one line of data from the file). The actual read occurs with "<filehandle>"; the amount of data read depends on the variable you save it into.

The following code shows how to read the entire file into an array, then loop through each element of the array to print out each line:

This code minimizes the amount of time the file is actually open. Throughout the rest of the book, we'll be using this method of reading files.

Back to our survey. Say we'd like to summarize the data: how many people took the survey; how most people reached the site; the average rating for the site; counts on how many people are involved in the various areas of webmastering; and a list of user comments.

Let's try it. Create a new file and name it surveysumm.cgi. This script will read the file into an array, then loop through each element of the array, incrementing several counters. At the end, it prints a web page that summarizes the data:

Source code: http://www.cgi101.com/class/ch6/surveysumm.txt
Working example: http://www.cgi101.com/class/ch6/surveysumm.cgi.

You'll notice that this summary CGI is actually longer than the script that handled the survey form; summarizing data from polls can often be a lengthy and complicated process. Also, our summary CGI doesn't do anything with the names or e-mail addresses of the people taking the survey; you may want to write a second CGI to dump those to another file, which you could then use for sending followup mail to your survey respondents.

A survey is just one use for data files. You can use this same code to hold contests, solicit suggestions, populate a mailing list, or for other interactive applications. Flat-file databases can also be used for generating online catalogs; we'll cover that in the next chapter, along with multi-CGI interaction.

Resources

Visit http://www.cgi101.com/class/ch6/ for source code and links from this chapter.


Copyright © 2000 by Jacqueline D. Hamilton.
Chapter 5 Table of Contents Conclusion