The Common Gateway Interface

This document is intended to ease you into the idea of writing your own CGI
programs with simplified examples and explanation which is less technically
oriented than the interface specification itself.

-------------------------------------------------------------------------------

How do I get information from the server?

Each time a client requests the URL corresponding to your gateway, the server
will execute your program. The output of your program will go more or less
directly to the client.

A common misconception about CGI is that you can send command-line switches to
your program, such as foobar -qa blorf. CGI uses the command line for other
purposes and thus this is not directly possible. Instead, CGI uses environment
variables to send your program its parameters. The two major environment
variables you will use for this purpose are:

   *  QUERY_STRING

     QUERY_STRING is defined as anything which follows the first ? in the URL
     used to access your gateway. This information could be added either by an
     HTML ISINDEX document, or by an HTML form (with the GET action). It could
     also be manually embedded in an HTML anchor which references your gateway.
     This string will usually be an information query, i.e. what the user wants
     to search for in the archie databases, or perhaps the encoded results of
     your feedback GET form.

     This string is encoded in the standard URL format of changing spaces to +,
     and encoding special characters with %xx hexadecimal encoding. You will
     need to decode it in order to use it.

     If your gateway is not decoding results from a FORM, you will also get the
     query string decoded for you onto the command line. This means that each
     word of the query string will be in a different section of ARGV. For
     example, the query string "forms rule" would be given to your program with
     argv[1]="forms" and argv[2]="rule". If you choose to use this, you do not
     need to do any processing on the data before using it.

   *  PATH_INFO

     Much of the time, you will want to send data to your gateways which the
     client shouldn't muck with. Such information could be the name of the FORM
     which generated the results they are sending.

     CGI allows for extra information to be embedded in the URL for your
     gateway which can be used to transmit extra context-specific information
     to the scripts. This information is usually made available as "extra"
     information after the path of your gateway in the URL. This information is
     not encoded by the server in any way.

     To illustrate this, let's say I have a CGI program on my server called
     /scripts/foobar. When I access foobar from a particular document, I want
     to tell foobar that I'm currently in the English language directory, not
     the Pig Latin directory. In this case, I could access my script in an HTML
     document as:

     <A HREF="/scripts/foobar/language=english">foobar</A>

     When the server executes foobar, it will give me PATH_INFO of
     /language=english, and my program can decode this and act accordingly.

-------------------------------------------------------------------------------

How do I send my document back to the client?

I have found that the most common error in beginners' CGI programs is not
properly formatting the output so the server can understand it.

CGI programs can return a myriad of document types. They can send back an image
to the client, and HTML document, a plaintext document, or perhaps even an
audio clip of your bodily functions. They can also return references to other
documents. The client must know what kind of document you're sending it so it
can present it accordingly. In order for the client to know this, your CGI
program must tell the server what type of document it is returning.

In order to tell the server what kind of document you are sending back, whether
it be an document or a reference to one, CGI requires you to place a short
header on your output. This header is ASCII text, consisting of lines separated
by either linefeeds or carraige returns followed by linefeeds. Your program
must output at least two such lines before its data will be sent directly back
to the client.

The first line will be different depending on whether your program is returning
a full document or a reference to one:

   *  A full document with a corresponding MIME type

     In this case, you must know the MIME type of your output. Common MIME
     types are things such as text/html for HTML, and text/plain for straight
     ASCII text.

     In order to tell the server your output's content type, the first line of
     your output should read:

     Content-type: type/subtype

     type/subtype is the MIME type for your output.

   *  A reference to another document

     Let's say you want to send a file already available on your information
     server to the client, or perhaps you want them to retrieve a document from
     another server altogether.

     If you want to reference another file on your own server, you should
     output a partial URL, such as the following:

     Location: /dir1/dir2/myfile.html

     In this case, the server will act as if the client had not requested your
     script, but instead requested http://yourserver/dir1/dir2/myfile.html. It
     will take care of all access control, determining the file's type, and all
     sorts of that ugly stuff that servers do.

     However, let's say you want to reference a file on your Gopher server. In
     this case, you should know the full URL of what you want to reference and
     output something like:

     Location: gopher://httprules.foobar.org/0

     In this case, the client will interpret your reply, and fetch the URL for
     the client automatically.

Next, you have to send the second line. With the current specification, THE
SECOND LINE SHOULD BE BLANK. This means that it should have nothing on it
except a linefeed. Once the server retrieves this line, it knows that you're
finished telling the server about your output and will now begin the actual
output. If you skip this line, the server will attempt to parse your output
trying to find further information about your request and you will become very
unhappy.

Advanced usage: If you would like to output headers such as Expires or
Content-encoding, you can if your server is compatible with CGI/1.1. Just
output them along with Location or Content-type and they will be sent back to
the client.

-------------------------------------------------------------------------------
Rob McCool robm@ncsa.uiuc.edu
