All Categories :
CGI & PERL
Chapter 11
Gateways
CONTENTS
Several different types of network services are available on the
Internet, ranging from e-mail to database lookups to the World
Wide Web. The ability to use one service to access other services
is sometimes convenient. For example, you might want to send e-mail
or post to USENET news from your Web browser. You might also want
to do a WAIS search and have the results sent to your Web browser.
A gateway is a link between these various services. Think
of a gateway between two different pastures: one representing
one service and the other representing another. In order to access
one service through another, you need to go through the gateway
(see Figure 11.1).
Figure 11.1 : A gateway.
Very often, your CGI programs act as gateways between the World
Wide Web and other services. After all, CGI stands for Common
Gateway Interface, and it was designed so that you could use the
World Wide Web as an interface to other services and applications.
In this chapter, you see a couple of examples of gateway applications,
beginning with a simple finger gateway. You learn how to take
advantage of existing client applications within your CGI applications,
and you learn the related security issues. You see an example
of developing a gateway from scratch rather than using an existing
application. Finally, you learn how to design a powerful e-mail
gateway.
Network applications all work in a similar fashion. You need to
know how to do two things: connect to the service and communicate
with it. The language that you use to communicate with the service
is called the protocol. You have already seen one type
of protocol in great detail: the web or http protocol, discussed
in Chapter 8, "Client/Server Issues."
Most network services already have clients that know how to properly
connect to the server and that understand the protocol. For example,
any Web browser understands the http protocol. If you want to
get information from a Web server, you don't need to know the
protocol. All you need to do is tell the browser what information
you want, and the browser does all the communicating for you.
If you already have a suitable client for various services, you
can easily write a Web gateway that gets input from the browser,
calls the program using the input, and sends the output back to
the browser. A diagram of this process is in Figure 11.2.
Figure 11.2 : Using existing clients to create a Web gateway.
Because the existing client does all the communicating for you,
your CGI program only needs to do a few things:
- Get the input from the browser
- Call the program with the specified input
- Possibly parse the output from the program
- Send the output to the browser
The first and last steps are easy. You know how to get input from
and send output to the browser using CGI. The middle two steps
are slightly more challenging.
Several ways exist to run a program from within another program;
some of them are platform specific. In C, the standard function
for running other programs is system()
from stdlib.h. The parameters for system()
and the behavior of this function usually depend on the operating
system. In the following examples, assume the UNIX platform, although
the concepts can apply generally to all platforms and languages.
On UNIX, system() accepts
the program as its parameter and its command-line parameters exactly
as you would type them on the command line. For example, if you
wanted to write an application that printed the contents of your
current directory, you could use the system()
function to call the UNIX program /bin/ls.
The program myls.c in Listing 11.1 does just that.
Listing 11.1. The myls.c program.
#include <stdlib.h>
int main()
{
system("/bin/ls"); /* assumes ls resides
in the /bin directory */
}
Tip |
When you use the system() or any other function that calls programs, remember to use the full pathname. This measure provides a reliable way to make sure the program you want to run is run, and it reduces the security risk by not depending on the PATH environment.
|
When the system() function
is called on UNIX, the C program spawns a shell process (usually
/bin/sh) and tells the shell
to use the input as its command line. Although this is a simple
and portable way to run programs, some inherent risks and extra
overhead occur when using it in UNIX. When you use system(),
you spawn another shell and run the program rather than run the
program directly. Additionally, because UNIX shells interpret
special characters (metacharacters), you can inadvertently allow
the user to run any program he or she wishes. For more information
about the risks of the system()
call, see Chapter 9, "CGI Security."
To directly run programs in C on UNIX platforms is more complex
and requires using the exec()
class of functions from unistd.h. Descriptions of each different
exec() function are in Table
11.1.
Table 11.1. The exec()
family.
Function | Description
|
execv()
| The first argument indicates the path to the program. The second is a null-terminated array of pointers to the argument list; the first argument is usually the name of the program.
|
Execl()
| The first argument is the path to the program. The remaining arguments are the program arguments; the second argument is usually the name of the program.
|
Execvp()
| Same as execv(), except the first argument stores the name of the program, and the function searches the PATH environment for that program.
|
Execlp()
| Same as execl(), except the first argument stores the name of the program, and the function searches the PATH environment for that program.
|
execle()
| Same as execl(), except it includes the environment for the program. Specifies the environment following the null pointer that terminates the list.
|
In order to execute a program directly under UNIX, you need to
create a new process for it. You can do this using the fork()
function. After you create a new process (known as the child),
your program (the parent) must wait until the child is finished
executing. You do this using the wait()
function.
Using the exec() function,
I rewrote myls.c, shown in Listing 11.2. The program is longer
and more complex, but it is more efficient. If you do not understand
this example, you might want to either read a book on UNIX system
programming or just stick to the system()
function, realizing the implications.
Listing 11.2. The myls.c program (using exec()).
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
int main()
{
int pid,status;
if ((pid = fork()) < 0) {
perror("fork");
exit(1);
}
if (pid == 0) { /* child process */
execl("/bin/ls","ls");
exit(1);
}
/* parent process */
while (wait(&status) != pid) ;
}
These programs print their output, unparsed, to stdout.
Although most of the time this is satisfactory, sometimes you
might want to parse the output. How do you capture the output
of these programs?
Instead of using the system()
function, you use the popen()
function, which uses UNIX pipes (popen
stands for pipe open). UNIX users will be familiar with the concept
of the pipe. For example, if you had a program that could manipulate
the output of the ls command,
in order to feed the output to this program you could use a pipe
from the command line (|
is the pipe symbol).
ls | dosomething
This step takes the output of ls
and feeds it into the input of dosomething.
The popen() function emulates
the UNIX pipe from within a program. For example, if you wanted
to pipe the output of the ls
command to the parse_output()
function, your code might look like the following:
FILE *output;
output = popen("/bin/ls","r");
parse_output(output);
pclose(output);
popen() works like system(),
except instead of sending the output to stdout,
it sends the output to a file handle and returns the pointer to
that file handle. You can then read from that file handle, parse
the data, and print the parsed data to stdout
yourself. The second argument of popen()
determines whether you read from or write to a pipe. If you want
to write to a pipe, you would replace "r"
with "w". Because
popen() works like system(),
it is also susceptible to the same security risks as system().
You should be able to filter any user input for metacharacters
before using it inside of popen().
Because popen() suffers from
the same problems as system(),
you might sometimes prefer to use the pipe()
function in conjunction with an exec()
function. pipe() takes an
array of two integers as its argument. If the call works, the
array contains the read and write file descriptors, which you
can then manipulate. pipe()
must be called before you fork and execute the program. Again,
this process is complex. If you don't understand this, don't worry
about it; you probably don't need to use it. An example of pipe()
appears later in this chapter, in "Parsing the Output in
Perl."
In each of these examples, the output is buffered by default,
which means that the system stores the output until it reaches
a certain size before sending the entire chunk of output to the
file handle. This process usually operates faster and more efficiently
than sending one byte of output to the file handle at a time.
Sometimes, however, you run the risk of losing part of the output
because the file handle thinks no more data exists, even though
some data is still left in the buffer. To prevent this from happening,
you need to tell your file handles to flush their buffers. In
C, you do this using the fflush()
function, which flushes the given file handle. For example, if
you wanted your program not to buffer the stdout,
you would use the following call:
fflush(stdout);
The syntax for running a program within a Perl program is less
complex than in C, but no less powerful. Perl also has a system()
function, which usually works exactly like its C equivalent. myls.pl
in Listing 11.3 demonstrates the Perl system()
function.
Listing 11.3. The myls.pl program.
#!/usr/local/bin/perl
system("/bin/ls");
As you can see, the syntax is exactly like the C syntax. Perl's
system() function, however,
will not necessarily spawn a new shell. If all the arguments passed
to system() are separate
parameters, Perl's system()
function is equivalent to the forking
and execing of programs in
C. For example, Listing 11.4 shows the Perl code for listing the
contents of the root directory and Listing 11.5 shows the C equivalent.
Listing 11.4. The lsroot.pl program.
#!/usr/local/bin/perl
system "/bin/ls","/";
Listing 11.5. The lsroot.c program.
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
int main()
{
int pid,status;
if ((pid = fork()) < 0) {
perror("fork");
exit(1);
}
if (pid == 0) { /* child process */
execl("/bin/ls","ls","/");
exit(1);
}
/* parent process */
while (wait(&status) != pid) ;
}
You will find it considerably easier to obtain the efficiency
and security of forking and then executing a program in Perl than
in C. Note, however, that if you had used the following:
system("/bin/ls /");
instead of this:
system "/bin/ls","/";
then the system call would have been exactly equivalent to the
C system call; in other words, it would spawn a shell.
Note |
You can also run programs directly in Perl using fork() and exec(). The syntax is the same as the C syntax using fork() and any of the exec() functions. Perl only has one exec() function, however, that is equivalent to C's execvp().
The exec() function by itself is equivalent to system() except that it terminates the currently running Perl script. In other words, if you included all of the arguments in one argument in exec(), it would spawn a shell and run the program, exiting from the Perl script after it finished. To prevent exec() from spawning a shell, separate the arguments just as you would with system().
|
Capturing and parsing the output of programs in Perl is also simpler
than in C. The easiest way to store the output of a Perl program
is to call it using back ticks (`). Perl spawns a shell and executes
the command within the back ticks, returning the output of the
command. For example, the following spawns a shell, runs /bin/ls,
and stores the output in the scalar $files:
$files = `/bin/ls`;
You can then parse $files
or simply print it to stdout.
You can also use pipes in Perl using the open()
function. If you want to pipe the output of a command (for example,
ls) to a file handle, you
would use the following:
open(OUTPUT,"ls|");
Similarly, you could pipe data into a program using the following:
open(PROGRAM,"|sort");
This syntax is equivalent to C's popen()
function and suffers from similar problems. In order to read from
a pipe without opening a shell, use
open(OUTPUT,"-|") || exec "/bin/ls";
To write to a pipe, use
open(PROGRAM,"|-") || exec
"/usr/bin/sort";
Make sure each argument for the program gets passed as a separate
argument to exec().
To unbuffer a file handle in Perl, use
select(FILEHANDLE); $| = 1;
For example, to unbuffer the stdout,
you would do the following:
select(stdout); $| = 1;
Using the methods described in the preceding section, you can
create a Web gateway using existing clients. Finger serves as
a good example. Finger enables you to get certain information
about a user on a system. Given a username and a hostname (in
the form of an e-mail address), finger will contact the server
and return information about that user if it is available.
The usage for the finger program on most UNIX systems is
finger username@hostname
For example, the following returns finger information about user
eekim at the machine hcs.harvard.edu:
finger eekim@hcs.harvard.edu
You can write a Web-to-finger CGI application, as shown in Listings
11.6 (in C) and 11.7 (in Perl). The browser passes the username
and hostname to the CGI program finger.cgi, which in turn runs
the finger program. Because finger already returns the output
to stdout, the output appears
on the browser.
You want the finger program to be flexible. In other words, you
should have the capability to specify the user and host from the
URL, and you should be able to receive information from a form.
Input for finger.cgi must be in the following form:
finger.cgi?who=username@hostname
If you use finger.cgi as the action parameter of a form, you must
make sure you have a text field with the name who.
Listing 11.6. The finger.cgi.c program.
#include <stdio.h>
#include <stdlib.h>
#include "cgi-lib.h"
#include "html-lib.h"
#include "string-lib.h"
#define FINGER "/usr/bin/finger "
void print_form()
{
html_begin("Finger Gateway");
h1("Finger Gateway");
printf("<form>\n");
printf("Who? <input name=\"who\">\n");
printf("</form>\n");
html_end();
}
int main()
{
char *command,*who;
llist entries;
html_header();
if (read_cgi_input(&entries)) {
if (cgi_val(entries,"who"))
{
who = newstr(escape_input(cgi_val(entries,"who")));
html_begin("Finger results");
printf("<pre>\n");
command = malloc(strlen(FINGER)
+ strlen(who) + 1);
strcpy(command,FINGER);
strcat(command,who);
fflush(stdout);
system(command);
printf("</pre>\n");
html_end();
}
else
print_form();
}
else
print_form();
list_clear(&entries);
}
Listing 11.7. The finger.cgi program (Perl).
#!/usr/local/bin/perl
require 'cgi-lib.pl';
select(stdout); $| = 1;
print &PrintHeader;
if (&ReadParse(*input)) {
if ($input{'who'}) {
print &HtmlTop("Finger
results"),"<pre>\n";
system "/usr/bin/finger",$input{'who'};
print "</pre>\n",&HtmlBot;
}
else {
&print_form;
}
}
else {
&print_form;
}
sub print_form {
print &HtmlTop("Finger Gateway");
print "<form>\n";
print "Who? <input name=\"who\">\n";
print "</form>\n";
print &HtmlBot;
}
Both the C and Perl versions of finger.cgi are remarkably similar.
Both parse the input, unbuffer stdout,
and run finger. The two versions, however, differ in how they
run the program. The C version uses the system()
call, which spawns a shell and runs the command. Because it spawns
a shell, it must escape all metacharacters before passing the
input to system(); hence,
the call to escape_input().
In the Perl version, the arguments are separated so it runs the
program directly. Consequently, no filtering of the input is necessary.
You can avoid filtering the input in the C version as well, if
you avoid the system() call.
Listing 11.8 lists a version of finger.cgi.c that uses execl()
instead of system(). Notice
that in this version of finger.cgi.c, you no longer need escape_input()
because no shell is spawned.
Listing 11.8. The finger.cgi.c program (without spawning a
shell).
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include "cgi-lib.h"
#include "html-lib.h"
#include "string-lib.h"
#define FINGER "/usr/bin/finger"
void print_form()
{
html_begin("Finger Gateway");
h1("Finger Gateway");
printf("<form>\n");
printf("Who? <input name=\"who\">\n");
printf("</form>\n");
html_end();
}
int main()
{
char *command,*who;
llist entries;
int pid,status;
html_header();
if (read_cgi_input(&entries)) {
if (cgi_val(entries,"who"))
{
who = newstr(cgi_val(entries,"who"));
html_begin("Finger results");
printf("<pre>\n");
command = malloc(strlen(FINGER)
+ strlen(who) + 1);
strcpy(command,FINGER);
strcat(command,who);
fflush(stdout);
if ((pid = fork()) < 0)
{
perror("fork");
exit(1);
}
if (pid == 0) { /* child
process */
execl(FINGER,"finger",who);
exit(1);
}
/* parent process */
while (wait(&status) !=
pid) ;
printf("</pre>\n");
html_end();
}
else
print_form();
}
else
print_form();
list_clear(&entries);
}
For a variety of reasons, you might want to parse the output before
sending it to the browser. Perhaps, for example, you want to surround
e-mail addresses and URLs with <a
href> tags. The Perl version of finger.cgi in Listing
11.9 has been modified to pipe the output to a file handle. If
you want to, you can then parse the data from the file handle
before sending it to the output.
Listing 11.9. The finger.cgi program (Perl using pipes).
#!/usr/local/bin/perl
require 'cgi-lib.pl';
select(stdout); $| = 1;
print &PrintHeader;
if (&ReadParse(*input)) {
if ($input{'who'}) {
print &HtmlTop("Finger
results"),"<pre>\n";
open(FINGER,"-|")
|| exec "/usr/bin/finger",$input{'who'};
while (<FINGER>)
{
print;
}
print "</pre>\n",&HtmlBot;
}
else {
&print_form;
}
}
else {
&print_form;
}
sub print_form {
print &HtmlTop("Finger Gateway");
print "<form>\n";
print "Who? <input name=\"who\">\n";
print "</form>\n";
print &HtmlBot;
}
It is extremely important to consider security when you write
gateway applications. Two specific security risks exist that you
need to avoid. First, as previously stated, avoid spawning a shell
if possible. If you cannot avoid spawning a shell, make sure you
escape any non-alphanumeric characters (metacharacters). You do
this by preceding the metacharacter with a backslash (\).
You must note that using a Web gateway could circumvent certain
access restrictions. For example, suppose your school, school.edu,
only allowed people to finger from within the school. If you set
up a finger gateway running on www.school.edu,
then anyone outside the school could finger machines within the
school. Because the finger gateway runs the finger program from
within the school.edu, the gateway sends the output to anyone
who requests it, including those outside of school.edu.
If you want to maintain access restrictions, you need to build
an access layer on your CGI program as well. You can use the REMOTE_ADDR
and REMOTE_HOST environment
variables to determine from where the browser is connecting.
If you do not already have an adequate client for certain network
services, or if you want to avoid the extra overhead of calling
this extra program directly, you can include the appropriate protocol
within your CGI application. This way, your CGI gateway talks
directly to the network service (see Figure 11.3)
rather than call another program that communicates with the service.
Figure 11.3 : A gateway that talks directly to the network service.
Although this way has an efficiency advantage, your programs are
longer and more complex, which means longer development time.
Additionally, you generally duplicate the work in the already
existing client that handles the network connections and communication
for you.
If you do decide to write a gateway client from scratch, you need
to first find the protocol. You can get most of the Internet network
protocols via ftp at ds.internic.net.
A nice Web front-end to various Internet protocols and RFC's exists
at <URL:http://www.cis.ohio-state.edu/hypertext/information/rfc.html>.
To write any direct gateways, you need to know some basic network
programming. This
section briefly describes network client programming on UNIX using
Berkeley sockets. The information in this section is not meant
to serve as a comprehensive tutorial to network programming; you
should refer to other sources for more information.
TCP/IP (Internet) network communication on UNIX is performed using
something called a socket (or a Berkeley socket). As far as the
programmer is concerned, the socket works the same as a file handle
(although internally, a socket is very different from a file handle).
Before you can do any network communication, you must open a socket
using the socket() function
(in both C and Perl). socket()
takes three arguments-a domain, a socket type, and a protocol-and
returns a file descriptor. The domain tells the operating system
how to interpret the given domain name. Because you are doing
Internet programming, you use the domain AF_INET
as defined in the header file, socket.h, which is located in /usr/include/sys.
The socket type is either SOCK_STREAM
or SOCK_DGRAM. You almost
definitely will use SOCK_STREAM,
which guarantees reliable, orderly delivery of information to
the server. Network services such as the World Wide Web, ftp,
gopher, and e-mail use SOCK_STREAM.
SOCK_DGRAM sends packets
in datagrams, little packets of information that are not guaranteed
to be delivered or delivered in order. Network File System (NFS)
is an example of a protocol that uses SOCK_DGRAM.
Finally, the protocol defines the transport layer protocol. Because
you are using TCP/IP, you want to define the network protocol
as TCP.
Note |
AF_INET, SOCK_STREAM, and SOCK_DGRAM are defined in <sys/socket.h>. In Perl, these values are not defined unless you have converted your C headers into Perl headers using the h2ph utility. The following values will work for almost any UNIX system:
- AF_INET: 2
- SOCK_STREAM: 1 (2 if using Solaris)
- SOCK_DGRAM: 2 (1 if using Solaris)
Solaris users should note that the values for SOCK_STREAM and SOCK_DGRAM are reversed.
|
After you create a socket, your client tries to connect to a server
through that socket. It uses the connect()
function to do so (again, this process works in both Perl and
C). In order for connect()
to work properly, it needs to know the socket, the IP address
of the server, and the port to which to connect.
In order to demonstrate network programming, this chapter shows
finger.cgi programmed to do a direct network connection. This
example appears in Perl; the C equivalent works in a similar way.
Once again, check a book on network programming for more information.
In order to modify finger.cgi into a direct finger gateway, you
need to change three things. First, you need to initialize various
network variables. Second, you need to split up the value of who
from e-mail form into a separate username and hostname. Finally,
you need to create the socket, make the network connection, and
communicate directly with the finger server. Listings 11.10 and
11.11 show the code for the first two tasks.
Listing 11.10. Initialize network variables.
$AF_INET = 2;
$SOCK_STREAM = 1; # Use 2 if using Solaris
$sockaddr = 'S n a4 x8';
$proto = (getprotobyname('tcp'))[2];
$port = (getservbyname('finger', 'tcp'))[2];
Listing 11.11. Separate the username and hostname and determine
IP address from hostname.
($username,$hostname) = split(/@/,$input{'who'});
$hostname = $ENV{'SERVER_NAME'} unless $hostname;
$ipaddr = (gethostbyname($hostname))[4];
if (!$ipaddr) {
print "Invalid hostname.\n";
}
else {
&do_finger($username,$ipaddr);
}
Communicating directly with the finger server requires understanding
how the finger server communicates. Normally, the finger server
runs on port 79 on the server. In order to use it, the server
expects the username followed by a CRLF. After it has the username,
the server searches for information about that user, sends it
to the client over the socket, and closes the connection.
Tip |
You can communicate directly with the finger server using the telnet command. Suppose you want to finger ed@gunther.org:
% telnet gunther.org 79
Trying 0.0.0.0...
Connected to gunther.org
Escape character is '^]'.
ed
After you press Enter, the finger information is displayed.
|
The code for connecting to and communicating with the finger server
appears in the &do_finger
function, listed in Listing 11.12.
Listing 11.12. The &do_finger
function.
sub do_finger {
local($username,$ipaddr) = @_;
$them = pack($sockaddr, $AF_INET, $port,
$ipaddr);
# get socket
socket(FINGER, $AF_INET, $SOCK_STREAM,
$proto) || die "socket: $!";
# make connection
if (!connect(FINGER,$them)) {
die "connect:
$!";
}
# unbuffer output
select(FINGER); $| = 1; select(stdout);
print FINGER "$username\r\n";
while (<FINGER>) {
print;
}
}
The completed program-dfinger.cgi-appears in Listing 11.13. Although
this program works more efficiently overall than the older version
(finger.cgi) you can see that it is more complex, and that the
extra complexity might not be worth the minute gain in efficiency.
For larger client/server gateways, however, you might see a noticeable
advantage to making a direct connection versus running an existing
client from the gateway.
Listing 11.13. The dfinger.cgi program (Perl).
#!/usr/local/bin/perl
require 'cgi-lib.pl';
# initialize network variables
$AF_INET = 2;
$SOCK_STREAM = 1; # Use 2 if using Solaris
$sockaddr = 'S n a4 x8';
$proto = (getprotobyname('tcp'))[2];
$port = (getservbyname('finger', 'tcp'))[2];
# unbuffer output
select(stdout); $| = 1;
# begin main
print &PrintHeader;
if (&ReadParse(*input)) {
if ($input{'who'}) {
print &HtmlTop("Finger
results"),"<pre>\n";
($username,$hostname)
= split(/@/,$input{'who'});
$hostname = $ENV{'SERVER_NAME'}
unless $hostname;
$ipaddr = (gethostbyname($hostname))[4];
if (!$ipaddr)
{
print
"Invalid hostname.\n";
}
else {
&do_finger($username,$ipaddr);
}
print "</pre>\n",&HtmlBot;
}
else {
&print_form;
}
}
else {
&print_form;
}
sub print_form {
print &HtmlTop("Finger Gateway");
print "<form>\n";
print "Who? <input name=\"who\">\n";
print "</form>\n";
print &HtmlBot;
}
sub do_finger {
local($username,$ipaddr) = @_;
$them = pack($sockaddr, $AF_INET, $port,
$ipaddr);
# get socket
socket(FINGER, $AF_INET, $SOCK_STREAM,
$proto) || die "socket: $!";
# make connection
if (!connect(FINGER,$them)) {
die "connect:
$!";
}
# unbuffer output
select(FINGER); $| = 1; select(stdout);
print FINGER "$username\r\n";
while (<FINGER>) {
print;
}
}
This chapter ends with examples of a very common gateway found
on the World Wide Web: a Web to e-mail gateway. The idea is that
you can take the content of a form and e-mail it to the specified
location using this gateway.
Many current browsers have built-in e-mail capabilities that enable
users to e-mail anyone and anywhere from their browsers. Clicking
on a tag such as the following will cause the browser to run a
mail client that will send a message to the recipient specified
in the <a href> tag:
<a href="mailto:eekim@hcs.harvard.edu">E-mail
me</a>
Why does anyone need a Web to e-mail gateway if most browsers
can act as e-mail clients?
An e-mail gateway can have considerable power over the built-in
mail clients and the mailto
references. For example, you could force all e-mail to have the
same format by using a fill-out form and a custom mail gateway.
This example becomes useful if you are collecting information
for future parsing, such as a poll. Having people e-mail their
answers in all sorts of different forms would make parsing extremely
difficult.
This section shows the development of a rudimentary mail gateway
in C. This gateway requires certain fields such as to
and uses an authentication file to limit the potential recipients
of e-mail from this gateway. Next, you see the form.cgi-the generic
form parsing CGI application developed in Chapter 10,
"Basic Applications"-extended to support e-mail.
mail.cgi is a simple e-mail gateway with the following specifications:
- If called with no input, it displays a
generic mail entry form.
- If no to
field specified, it sends e-mail by default to a predefined Web
administrator.
- Only uses to,
name, email,
subject, and message
fields. Ignores all other fields.
- Sends an error message if the user does
not fill out any fields.
- Uses an authentication file to make sure
only certain people receive e-mail from this gateway.
As you can see, mail.cgi is fairly inflexible, but it serves its
purpose adequately. It will ignore any field other than those
specified. You could not include a poll on your HTML form because
that information would simply be ignored by mail.cgi. This CGI
functions essentially equivalent to the mailto
reference tag, except for the authentication file.
Why use an authentication file? Mail using this gateway is easily
forged. Because the CGI program has no way of knowing the identity
of the user, it asks the user to fill out that information. The
user could easily fill out false information. In order to prevent
people from using this gateway to send forged e-mail to anyone
on the Internet, it will enable you to send e-mail only to those
specified in a central authentication file maintained by the server
administrator. As an added protection against forged e-mail, mail.cgi
adds an X-Sender mail header that says this e-mail was sent using
this gateway.
The authentication file contains valid e-mail recipients, one
on each line. For example, your authentication file might look
like this:
eekim@hcs.harvard.edu
president@whitehouse.gov
In this case, you could only use mail.cgi to send e-mail to me
and the President.
Finally, you need to decide how to send the e-mail. A direct connection
does not seem like a good solution: the Internet e-mail protocol
can be a fairly complex thing, and making direct connections to
mail servers seems unnecessary. The sendmail
program, which serves as an excellent mail transport agent for
e-mail, is up-to-date, fairly secure, and fairly easy to use.
This example uses popen()
to pipe the data into the sendmail program, which consequently
sends the information to the specified address.
The code for mail.cgi appears in Listing 11.14. There are a few
features of note. First, even though this example uses popen(),
it doesn't bother escaping the user input because mail.cgi checks
all user inputted e-mail addresses with the ones in the central
authentication file. Assume that neither the e-mail addresses
in the central access file nor the hard-coded Web administrator's
e-mail address (defined as WEBADMIN)
are invalid.
Listing 11.14. The mail.cgi.c program.
#include <stdio.h>
#include "cgi-lib.h"
#include "html-lib.h"
#include "string-lib.h"
#define WEBADMIN "web@somewhere.edu"
#define AUTH "/usr/local/etc/httpd/conf/mail.conf"
void NullForm()
{
html_begin("Null Form Submitted");
h1("Null Form Submitted");
printf("You have sent an empty form. Please
go back and fill out\n");
printf("the form properly, or email <i>%s</i>\n",WEBADMIN);
printf("if you are having difficulty.\n");
html_end();
}
void authenticate(char *dest)
{
FILE *access;
char s[80];
short FOUND = 0;
if ( (access = fopen(AUTH,"r")) != NULL)
{
while ( (fgets(s,80,access)!=NULL) &&
(!FOUND) ) {
s[strlen(s) - 1] = '\0';
if (!strcmp(s,dest))
FOUND = 1;
}
if (!FOUND) {
/* not authenticated */
html_begin("Unauthorized
Destination");
h1("Unauthorized Destination");
html_end();
exit(1);
}
}
else { /* access file not found */
html_begin("Access file not found");
h1("Access file not found");
html_end();
exit(1);
}
}
int main()
{
llist entries;
FILE *mail;
char command[256] = "/usr/lib/sendmail ";
char *dest,*name,*email,*subject,*content;
html_header();
if (read_cgi_input(&entries)) {
if ( !strcmp("",cgi_val(entries,"name"))
&&
!strcmp("",cgi_val(entries,"email"))
&&
!strcmp("",cgi_val(entries,"subject"))
&&
!strcmp("",cgi_val(entries,"content"))
)
NullForm();
else {
dest = newstr(cgi_val(entries,"to"));
name = newstr(cgi_val(entries,"name"));
email = newstr(cgi_val(entries,"email"));
subject = newstr(cgi_val(entries,"subject"));
if (dest[0]=='\0')
strcpy(dest,WEBADMIN);
else
authenticate(dest);
/* no need to escape_input()
on dest, since we assume there aren't
insecure
entries in the authentication file. */
strcat(command,dest);
mail = popen(command,"w");
if (mail == NULL) {
html_begin("System
Error!");
h1("System
Error!");
printf("Please
mail %s and inform\n",WEBADMIN);
printf("the
web maintainers that the comments script is improperly\n");
printf("configured.
We apologize for the inconvenience<p>\n");
printf("<hr>\r\nWeb
page created on the fly by ");
printf("<i>%s</i>.\n",WEBADMIN);
html_end();
}
else {
content = newstr(cgi_val(entries,"content"));
fprintf(mail,"From:
%s (%s)\n",email,name);
fprintf(mail,"Subject:
%s\n",subject);
fprintf(mail,"To:
%s\n",dest);
fprintf(mail,"X-Sender:
%s\n\n",WEBADMIN);
fprintf(mail,"%s\n\n",content);
pclose(mail);
html_begin("Comment
Submitted");
h1("Comment
Submitted");
printf("You
submitted the following comment:\r\n<pre>\r\n");
printf("From:
%s (%s)\n",email,name);
printf("Subject:
%s\n\n",subject);
printf("%s\n</pre>\n",content);
printf("Thanks
again for your comments.<p>\n");
printf("<hr>\nWeb
page created on the fly by ");
printf("<i>%s</i>.\n",WEBADMIN);
html_end();
}
}
else {
html_begin("Comment Form");
h1("Comment Form");
printf("<form method=POST>\n";
printf("<input type=hidden name=\"to\"
value=\"%s\">\n",WEBADMIN);
printf("<p>Name: <input
name=\"name\"><br>\n");
printf("E-mail: <input name=\"email\"><br>\n");
printf("Subject: <input name=\"subject\"></p>\n");
printf("<p>Comments:<br>\n");
printf("<textarea name="content"
rows=10 cols=70></textarea></p>\n");
printf("<input type=submit value=\"Mail
form\">\n");
printf("</form>\n");
html_end();
}
list_clear(&entries);
return 0;
}
You might notice that the example uses statically allocated strings
for some values, such as the command string. The assumption is
that you know the maximum size limit of this string because you
know where the command is located (in this case, /usr/lib/sendmail),
and you assume that any authorized e-mail address will not put
this combined string over the limit. The example essentially cheats
on this step to save coding time. If you want to extend and generalize
this program, however, you might need to change this string to
a dynamically allocated one.
mail.cgi doesn't serve as a tremendously useful gateway for most
people, although it offers some nice features over using the <a
href="mailto"> tag. A fully configurable
mail program that could parse anything, that could send customized
default forms, and that could send e-mail in a customizable format
would be ideal.
These desires sound suspiciously like the specifications for form.cgi,
the generic forms parser developed in Chapter 10.
In fact, the only difference between the form.cgi program described
earlier and the program described here is that the program described
here sends the results via e-mail rather than saving them to a
file.
Instead of rewriting a completely new program, you can use form.cgi
as a foundation and extend the application to support e-mail as
well. This action requires two major changes:
- A mailto
configuration option in the configuration file.
- A function that will e-mail the data rather
than save the data.
If a MAILTO option is in
the configuration file, form.cgi e-mails the results to the address
specified by MAILTO. If neither
a MAILTO nor OUTPUT
option is specified in the configuration file, then form.cgi returns
an error. The new form.cgi with e-mail support appears in Listing
11.15.
Listing 11.15. The form.cgi program (with mail support).
#!/usr/local/bin/perl
require 'cgi-lib.pl';
$global_config = '/usr/local/etc/httpd/conf/form.conf';
$sendmail = '/usr/lib/sendmail';
# parse config file
$config = $ENV{'PATH_INFO'};
if (!$config) {
$config = $global_config;
}
open(CONFIG,$config) || &CgiDie("Could not open config
file");
while ($line = <CONFIG>) {
$line =~ s/[\r\n]//;
if ($line =~ /^FORM=/) {
($form = $line) =~ s/^FORM=//;
}
elsif ($line =~ /^TEMPLATE=/) {
($template = $line) =~ s/^TEMPLATE=//;
}
elsif ($line =~ /^OUTPUT=/) {
($output = $line) =~ s/^OUTPUT=//;
}
elsif ($line =~ /^RESPONSE=/) {
($response = $line) =~ s/^RESPONSE=//;
}
elsif ($line =~ /^MAILTO=/) {
($mailto = $line)
=~ s/^MAILTO=//;
}
}
close(CONFIG);
# process input or send form
if (&ReadParse(*input)) {
# read template into list
if ($template) {
open(TMPL,$template) || &CgiDie("Can't
Open Template");
@TEMPLATE = <TMPL>;
close(TMPL);
}
else {
&CgiDie("No template specified");
}
if ($mailto) {
$mail = 1;
open(MAIL,"-|")
|| exec $sendmail,$mailto;
print MAIL "To:
$mailto\n";
print MAIL "From:
$input{'email'} ($input{'name'})\n";
print MAIL "Subject:
$subject\n" unless (!$subject);
print MAIL "X-Sender:
form.cgi\n\n";
foreach $line (@TEMPLATE) {
if ( ($line =~
/\$/) || ($line =~ /\%/) ) {
# form variables
$line =~ s/^\$(\w+)/$input{$1}/;
$line =~ s/([^\\])\$(\w+)/$1$input{$2}/g;
# environment
variables
$line =~ s/^\%(\w+)/$ENV{$1}/;
$line =~ s/([^\\])\%(\w+)/$1$ENV{$2}/g;
}
print MAIL $line;
}
close(MAIL);
}
else {
$mail = 0;
}
# write to output file according to template
if ($output) {
open(OUTPUT,">>$output")
|| &CgiDie("Can't Append to $output");
foreach $line (@TEMPLATE) {
if ( ($line =~
/\$/) || ($line =~ /\%/) ) {
# form variables
$line =~ s/^\$(\w+)/$input{$1}/;
$line =~ s/([^\\])\$(\w+)/$1$input{$2}/g;
# environment
variables
$line =~ s/^\%(\w+)/$ENV{$1}/;
$line =~ s/([^\\])\%(\w+)/$1$ENV{$2}/g;
}
print OUTPUT $line;
}
close(OUTPUT);
}
elsif (!$mail) {
&CgiDie("No output file specified");
}
# send either specified response or dull
response
if ($response) {
print "Location: $response\n\n";
}
else {
print &PrintHeader,&HtmlTop("Form
Submitted");
print &HtmlBot;
}
}
elsif ($form) {
# send default form
print "Location: $form\n\n";
}
else {
&CgiDie("No default form specified");
}
The changes to form.cgi are very minor. All that you had to add
was an extra condition for the configuration parsing function
and a few lines of code that will run the sendmail
program in the same manner as mail.cgi.
You can write CGI programs that act as gateways between the World
Wide Web and other network applications. You can take one of two
approaches to writing a CGI gateway: either embed an existing
client into a CGI program, or program your CGI application to
understand the appropriate protocols and to make the network connections
directly. Advantages and disadvantages exist with both methods,
although for most purposes, running the already existing client
from within your CGI application provides a more than adequate
solution. If you do decide to take this approach, you must remember
to carefully consider any possible security risks in your code,
including filtering out shell metacharacters and redefining access
restrictions.

Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.