The text and programs here are from the first edition of CGI Programming 101. This has been replaced by the 2nd edition; please click here to view the updated material from the 2nd edition.

Chapter 4: Processing Forms


Most forms you create will send their data using the POST method. POST is more secure than GET, since the data isn't sent as part of the URL, and you can send more data with POST. Also, your browser, web server, or proxy server may cache GET queries, but POSTed data is sent each time. However, since data posted via most forms is often more complex than a single word or two, decoding posted data is a little more work.

Your web server, when sending form data to your CGI, encodes the data being sent. Alphanumeric characters are sent as themselves; spaces are converted to plus signs (+); other characters - like tabs, quotes, etc. - are converted to "%HH" - a percent sign and two hexadecimal digits representing the ASCII code of the character. This is called URL encoding. Here's a table of some commonly encoded characters:

In order to do anything useful with the data, your CGI must decode these. Fortunately, this is pretty easy to do in Perl, using the substitute and translate commands. Perl has powerful pattern matching and replacement capabilities; it can match the most complex patterns in a string, using regular expressions (see Chapter 14). But it's also quite capable of the most simple replacements. The basic syntax for substitutions is:

This command substitutes "pattern" for "replacement" in the scalar variable "$mystring". Notice the operator is a =~ (an equal sign followed by a tilde) - this is a special operator for Perl, telling it that it's about to do a pattern match or replacement. Here's an example of how it works:

The above code will print out "Hello. My name is Bob." Notice the substitution has replaced "xnamex" with "Bob" in the $greetings string.

A similar but slightly different command is the translate command:

This command translates every character in "searchlist" to its corresponding character in "replacementlist", for the entire value of $mystring. One common use of this is to change the case of all characters in a string:

This results in $lowerc being translated to all lowercase letters. The brackets around [A-Z] denote a class of characters to match.

Decoding Form Data

With the POST method, form data is sent in an input stream from the server to your CGI. To get this data, store it, and decode it, we'll use the following block of code:

Let's look at each part of this. First, we read the input stream using this line:

The input stream is coming in over STDIN (standard input), and we're using Perl's read function to store the data into the scalar variable $buffer. You'll also notice the third argument to the read function, which specifies the length of data to be read; we want to read to the end of the CONTENT_LENGTH, which is set as an environment variable by the server.

Next we split the buffer into an array of pairs:

As with the GET method, form data pairs are separated by & signs when they are transmitted, such as fname=joe&lname=smith. Now we'll use a foreach loop to further splits each pair on the equal signs:

The next line translates every "+" sign back to a space:

Next is a rather complicated regular expression that substitutes every %HH hex pair back to its equivalent ASCII character, using the pack() function. We'll learn exactly how this works in Chapter 14, but for now we'll just use it to parse the form data:

Finally, we store the values into a hash called %FORM:

The keys of %FORM are the form input names themselves. So, for example, if you have three text fields in the form - called name, email-address, and age - you could refer to them in your script by using $FORM{'name'}, $FORM{'email-address'}, and $FORM{'age'}.

Let's try it. Start a new CGI, and name it post.cgi. Enter the following, save it, and chmod it:

Source code: http://www.cgi101.com/class/ch4/post.txt

This code can be used to handle almost any form, from a simple guestbook form to a more complex order form. Whatever variables you have in your form, this CGI will print them out, along with the data that was entered.

Let's test the script. Create an HTML form with the fields listed below:

Source code: http://www.cgi101.com/class/ch4/post.html

Enter some data into the fields, and press "send" when finished. The output will be the variable names of these text boxes, plus the actual data you typed into each field.

Tip: If you've had trouble getting the boxes to align on your form, try putting <pre> tags around the input fields. Then you can line them up with your text editor, and the result is a much neater looking form. The reason for this is that most web browsers use a fixed-width font (like Monaco or Courier) for preformatted text, so aligning forms and other data is much easier in a preformatted text block than in regular HTML. This will only work if your text editor is also using a fixed-width font! Another way to align input boxes is to put them all into a table, with the input name in the left column, and the input box in the right column.

A Form-to-Email CGI

Most people using forms want the data emailed back to them, so, let's write a form-to-mail CGI. First you'll need to figure out where the sendmail program lives on the Unix system you're on. (For cgi101.com, it's in /usr/sbin/sendmail. If you're not sure where yours is, try doing "which sendmail" or "whereis sendmail"; usually one of these two commands will yield the location of the sendmail program.)

Copy your post.cgi to a new file named mail.cgi. Now the only change will be to the foreach loop. Instead of printing to standard output (the HTML page the person sees after clicking submit), you want to print the values of the variables to a mail message. So, first, we must open a pipe to the sendmail program:

The pipe causes all of the ouput we print to that filehandle (MAIL) to be fed directly to the sendmail program as if it were standard input to that program.

You also need to specify the recipient of the email, with either:

Perl will complain if you use an "@" sign inside a double-quoted string or a print <<EndHTML block. You can safely put an @-sign inside a single-quoted string, like 'nullbox@cgi101.com', or you can escape the @-sign in other strings by using a backslash. For example, "nullbox\@cgi101.com".

You don't need to include the comments in the following code; they are just there to show you what's happening.

Source code: http://www.cgi101.com/class/ch4/mail.txt

Now let's test the new script. Here's the form again, only the action this time points to mail.cgi:

Working example: http://www.cgi101.com/class/ch4/mail.html

Save it, enter some data into the form, and press "send". If the script runs successfully, you'll get email in a few moments with the results of your post. (Remember to change the $recipient in the form to your email address!)

Sending Mail to More Than One Recipient

What if you want to send the output of the form to more than one email address? Simple: just add the desired addresses to the $recipients line:

Subroutines

In the above script we used a new structure: a subroutine called "dienice." A subroutine is a block of code, separate from the main program, that only gets run if it's directly called. In the above example, dienice only runs if the main program can't open sendmail. Rather than aborting and giving you a server error (or worse, NO error), you want your script to give you some useful data about what went wrong; dienice does that, by printing the error message and closing html tags, and exiting from Perl. There are several ways to call a subroutine:

The &-sign before the subroutine name is optional. args are values to pass into the subroutine.

Subroutines are useful for isolating blocks of code that are reused frequently in your script. The structure of a subroutine is as follows:

A subroutine can be placed anywhere in your CGI, though for readability it's usually best to put them at the end, after your main code. You can also include and use subroutines from different files and modules; we'll cover that more in Chapter 17.

You can pass data into your subroutines. For example:

This passes the scalar variables $a, $b, and $c to the mysub subroutine. The data being passed (called arguments) is sent as a list. The subroutine accesses the list of arguments via the special array "@_". You can then assign the elements of that array to special temporary variables, like so:

Notice the my in front of the variable list? my is a Perl function that limits the scope of a variable or list of variables to the enclosing subroutine. This keeps your temporary variables visible only to the subroutine itself (where they're actually needed and used), rather than to the entire script (where they're not needed).

We'll be using the dienice subroutine throughout the rest of the book, as a generic catch-all error-handler.

Resources

Visit http://www.cgi101.com/class/ch4/ for source code and links from this chapter.


Copyright © 2000 by Jacqueline D. Hamilton.
Chapter 3 Table of Contents Chapter 5