This is the full text for Chapter 3 of CGI Programming 101. For source code and links from this chapter, click here.

Chapter 3: CGI Environment Variables

Environment variables are a series of hidden values that the web server sends to every CGI program you run. Your program can parse them and use the data they send. Environment variables are stored in a hash named %ENV:

Key Value
DOCUMENT_ROOT The root directory of your server
HTTP_COOKIE The visitor's cookie, if one is set
HTTP_HOST The hostname of the page being attempted
HTTP_REFERER The URL of the page that called your program
HTTP_USER_AGENT The browser type of the visitor
HTTPS "on" if the program is being called through a secure server
PATH The system path your server is running under
QUERY_STRING The query string (see GET, below)
REMOTE_ADDR The IP address of the visitor
REMOTE_HOST The hostname of the visitor (if your server has reverse-name-lookups on; otherwise this is the IP address again)
REMOTE_PORT The port the visitor is connected to on the web server
REMOTE_USER The visitor's username (for .htaccess-protected pages)
REQUEST_METHOD GET or POST
REQUEST_URI The interpreted pathname of the requested document or CGI (relative to the document root)
SCRIPT_FILENAME The full pathname of the current CGI
SCRIPT_NAME The interpreted pathname of the current CGI (relative to the document root)
SERVER_ADMIN The email address for your server's webmaster
SERVER_NAME Your server's fully qualified domain name (e.g. www.cgi101.com)
SERVER_PORT The port number your server is listening on
SERVER_SOFTWARE The server software you're using (e.g. Apache 1.3)

Some servers set other environment variables as well; check your server documentation for more information. Notice that some environment variables give information about your server, and will never change (such as SERVER_NAME and SERVER_ADMIN), while others give information about the visitor, and will be different every time someone accesses the program.

Not all environment variables get set. REMOTE_USER is only set for pages in a directory or subdirectory that's password-protected via a .htaccess file. (See Chapter 20 to learn how to password protect a directory.) And even then, REMOTE_USER will be the username as it appears in the .htaccess file; it's not the person's email address. There is no reliable way to get a person's email address, short of asking them for it with a web form.

You can print the environment variables the same way you would any hash value:

Let's try printing some environment variables. Start a new file named env.cgi:

Program 3-1: env.cgi - Print Environment Variables Program


Source code: http://www.cgi101.com/book/ch3/env-cgi.html
Working example: http://www.cgi101.com/book/ch3/env.cgi

Save the file, chmod 755 env.cgi, then try it in your web browser. Compare the environment variables displayed with the list on the previous page. Notice which values show information about your server and CGI program, and which ones give away information about you (such as your browser type, computer operating system, and IP address).

Let's look at several ways to use some of this data.

Referring Page

When you click on a hyperlink on a web page, you're being referred to another page. The web server for the receiving page keeps track of the referring page, and you can access the URL for that page via the HTTP_REFERER environment variable. Here's an example:

Program 3-2: refer.cgi - HTTP Referer Program


Source code: http://www.cgi101.com/book/ch3/refer-cgi.html
Working example: http://www.cgi101.com/book/ch3/ (click on refer.cgi)

Remember, HTTP_REFERER only gets set when a visitor actually clicks on a link to your page. If they type the URL directly (or use a bookmarked URL), then HTTP_REFERER is blank. To properly test your program, create an HTML page with a link to refer.cgi, then click on the link:

HTTP_REFERER is not a foolproof method of determining what page is accessing your program. It can easily be forged.

Remote Host Name, and Hostname Lookups

You've probably seen web pages that greet you with a message like "Hello, visitor from (yourhost)!", where (yourhost) is the hostname or IP address you're currently logged in with. This is a pretty easy thing to do because your IP address is stored in the %ENV hash.

If your web server is configured to do hostname lookups, then you can access the visitor's actual hostname from the $ENV{REMOTE_HOST} value. Servers often don't do hostname lookups automatically, though, because it slows down the server. Since $ENV{REMOTE_ADDR} contains the visitor's IP address, you can reverse-lookup the hostname from the IP address using the Socket module in Perl. As with CGI.pm, you have to use the Socket module:

(There is no need to add qw(:standard) for the Socket module.)

The Socket module offers numerous functions for socket programming (most of which are beyond the scope of this book). We're only interested in the reverse-IP lookup for now, though. Here's how to do the reverse lookup:

There are actually two functions being called here: gethostbyaddr and inet_aton. gethostbyaddr is a built-in Perl function that returns the hostname for a particular IP address. However, it requires the IP address be passed to it in a packed 4-byte format. The Socket module's inet_aton function does this for you.

Let's try it in a CGI program. Start a new file called rhost.cgi, and enter the following code:

Program 3-3: rhost.cgi - Remote Host Program


Source code: http://www.cgi101.com/book/ch3/rhost-cgi.html
Working example: http://www.cgi101.com/book/ch3/rhost.cgi

Detecting Browser Type

The HTTP_USER_AGENT environment variable contains a string identifying the browser (or "user agent") accessing the page. Unfortunately there is no standard (yet) for user agent strings, so you will see a vast assortment of different strings. Here's a sampling of some:

As you can see, sometimes the user agent string reveals what type of browser and computer the visitor is using, and sometimes it doesn't. Some of these aren't even browsers at all, like the search engine robots (Googlebot, Inktomi and Scooter) and RSS reader (NetNewsWire). You should be careful about writing programs (and websites) that do browser detection. It's one thing to collect browser info for logging purposes; it's quite another to design your entire site exclusively for a certain browser. Visitors will be annoyed if they can't access your site because you think they have the "wrong" browser.

That said, here's an example of how to detect the browser type. This program uses Perl's index function to see if a particular substring (such as "MSIE") exists in the HTTP_USER_AGENT string. index is used like so:

It returns a numeric value indicating where in the string the substring appears, or -1 if the substring does not appear in the string. We use an if/else block in this program to see if the index is greater than -1.

Program 3-4: browser.cgi - Browser Detection Program


Source code: http://www.cgi101.com/book/ch3/browser-cgi.html
Working example: http://www.cgi101.com/book/ch3/browser.cgi

If you have several different browsers installed on your computer, try testing the program with each of them.

We'll look more at if/else blocks in Chapter 5.

A Simple Form Using GET

There are two ways to send data from a web form to a CGI program: GET and POST. These methods determine how the form data is sent to the server.

With the GET method, the input values from the form are sent as part of the URL and saved in the QUERY_STRING environment variable. With the POST method, data is sent as an input stream to the program. We'll cover POST in the next chapter, but for now, let's look at GET.

You can set the QUERY_STRING value in a number of ways. For example, here are a number of direct links to the env.cgi program:

Try opening each of these in your web browser. Notice that the value for QUERY_STRING is set to whatever appears after the question mark in the URL itself. In the above examples, it's set to "test1", "test2", and "test3" respectively.

You can also process simple forms using the GET method. Start a new HTML document called envform.html, and enter this form:

Program 3-5: envform.html - Simple HTML Form Using GET

Working example: http://www.cgi101.com/book/ch3/envform.html

Save the form and upload it to your website. Remember you may need to change the path to env.cgi depending on your server; if your CGI programs live in a "cgi-bin" directory then you should use action="cgi-bin/env.cgi".

Bring up the form in your browser, then type something into the input field and hit return. You'll notice that the value for QUERY_STRING now looks like this:

The string to the left of the equals sign is the name of the form field. The string to the right is whatever you typed into the input box. Notice that any spaces in the string you typed have been replaced with a +. Similarly, various punctuation and other special non-alphanumeric characters have been replaced with a %-code. This is called URL-encoding, and it happens with data submitted through either GET or POST methods.

You can send multiple input data values with GET:

This will be passed to the env.cgi program as follows:

The two form values are separated by an ampersand (&). You can divide the query string with Perl's split function:

split lets you break up a string into a list of strings, splitting on a specific character. In this case, we've split on the "&" character. This gives us an array named @values containing two elements: ("fname=joe", "lname=smith"). We can further split each string on the "=" character using a foreach loop:

This prints out the field names and the data entered into each field in the form. It does not do URL-decoding, however. A better way to parse QUERY_STRING variables is with CGI.pm.

Using CGI.pm to Parse the Query String

If you're sending more than one value in the query string, it's best to use CGI.pm to parse it. This requires that your query string be of the form:

For multiple values, it should look like this:

This will be the case if you are using a form, but if you're typing the URL directly then you need to be sure to use a fieldname, an equals sign, then the field value.

CGI.pm provides these values to you automatically with the param function:

This returns the value entered in the fieldname field. It also does the URL-decoding for you, so you get the exact string that was typed in the form field.

You can get a list of all the fieldnames used in the form by calling param with no arguments:

param is NOT a Variable

param is a function call. You can't do this:

If you want to print the value of param($p), you can print it by itself:

Or call param outside of the double-quoted strings:

You won't be able to use param('fieldname') inside a here-document. You may find it easier to assign the form values to individual variables:

Another way would be to assign every form value to a hash:

You can achieve the same result by using CGI.pm's Vars function:

The Vars function is not part of the "standard" set of CGI.pm functions, so it must be included specifically in the use statement.

Either way, after storing the field values in the %form hash, you can refer to the individual field names by using $form{'fieldname'}. (This will not work if you have a form with multiple fields having the same field name.)

Let's try it now. Create a new form called getform.html:

Program 3-6: getform.html - Another HTML Form Using GET


Working example: http://www.cgi101.com/book/ch3/getform.html

Save and upload it to your webserver, then bring up the form in your web browser.

Now create the CGI program called get.cgi:

Program 3-7: get.cgi Form Processing Program Using GET


Source code: http://www.cgi101.com/book/ch3/get-cgi.html

Save and chmod 755 get.cgi. Now fill out the form in your browser and press submit. If you encounter errors, refer back to Chapter 1 for debugging.

Take a look at the full URL of get.cgi after you press submit. You should see all of your form field names and the data you typed in as part of the URL. This is one reason why GET is not the best method for handling forms; it isn't secure.

GET is NOT Secure

GET is not a secure method of sending data. Don't use it for forms that send password info, credit card data or other sensitive information. Since the data is passed through as part of the URL, it'll show up in the web server's logfile (complete with all the data). Server logfiles are often readable by other users on the system. URL history is also saved in the browser and can be viewed by anyone with access to the computer. Private information should always be sent with the POST method, which we'll cover in the next chapter. (And if you're asking visitors to send sensitive information like credit card numbers, you should also be using a secure server in addition to the POST method.)

There may also be limits to how much data can be sent with GET. While the HTTP protocol doesn't specify a limit to the length of a URL, certain web browsers and/or servers may.

Despite this, the GET method is often the best choice for certain types of applications. For example, if you have a database of articles, each with a unique article ID, you would probably want a single article.cgi program to serve up the articles. With the article ID passed in by the GET method, the program would simply look at the query string to figure out which article to display:

We'll be revisiting that idea later in the book. For now, let's move on to Chapter 4 where we'll see how to process forms using the POST method.


Previous Contents Next

Back to Top