The Gopher to X.500 Gateway
Timothy A Howes
University of Michigan

Abstract

This document describes the Gopher to X.500 gateway software
developed at the University of Michigan.  Gopher is a simple
distributed document search and retrieval protocol for the Internet.
X.500 is the OSI standard for distributed directory service.  The
Gopher to X.500 gateway software gives Gopher users access to X.500
transparently through their Gopher user agents.  The Gopher to X.500
gateway services at U-M currently handle over 5000 connections per
day, making it the single largest user of X.500 services at U-M.

Introduction

The Gopher and X.500 protocols are very different.  Gopher was
developed at the University of Minnesota and originally described by
a short six page paper [1].  Simplicity is one of Gopher's key
objectives.  It is basically a navigation and retrieval protocol,
imposing very little structure on the data ultimately retrieved or
the namespace in which it lives.  The Gopher namespace forms a
general graph structure, though in practice this is often restricted
to a hierarchy.  No facilities are provided in the protocol for
modification or replication of data.  X.500, on the other hand, is
quite complicated by any standard.  Developed by an ISO and CCITT
committee, it is described by an International Standard [3] several
hundred pages long.  X.500 is a directory service protocol meant to
provide navigation, naming, data storage and retrieval, and
authentication capabilities.  It defines a strict information
framework that requires data to be well-structured and strongly
typed.  The X.500 namespace is hierarchical, reflecting the political
and geographical boundaries of countries, states, localities,
organizations, etc.  Aliases can be used to provide a more general
graph structure.

Despite their differences, tying Gopher and X.500 together has proved
to be surprisingly easy.  This paper describes _Go500_ and _Go500gw_,
two Gopher to X.500 gateways developed at the University of
Michigan.  Since the gateways were deployed and announced at a Gopher
developers conference, combined usage has steadily increased to over
5000 connections per day, making them the biggest users of the
University of Michigan X.500 service.  In addition, a number of other
sites are now running gateways of their own.  The first two sections
and give brief overviews of Gopher and X.500, respectively.  The next
sections describe the gateways themselves, followed by a few comments
about our implementation experiences, and a look at the gateway usage
in more detail.  Finally, we consider possible future work in this
area, followed by information on how to get the gateway software,
which is freely available on the Internet.

Overview of Gopher

The Gopher protocol is based on a client-server model in which Gopher
servers hold documents which may be accessed by users running Gopher
clients.  Only navigation and retrieval facilities are provided by
Gopher.  The protocol makes no provision for the modification of
data.  The model resembles a hierarchical filesystem, with both files
(documents) and directories (menus).  Menus contain lists of
documents and/or other menus.  Gopher documents can contain text,
sound, pictures, etc.  Other types of menu entries are also possible,
e.g., index search servers.  Each item listed in a Gopher menu has a
type (document, menu, etc.), a name which is usually displayed to the
user, a _selector string_ used internally by the client and server to
uniquely identify the item, and the IP address and TCP port number of
the Gopher server holding the item.  These pieces of information are
tab-separated when presented to a client.

A client retrieves an initial list of menu items by connecting to a
Gopher server and sending it a null line.  The Gopher server responds
with the list of items in its ``root'' menu.  Menu items are
separated by newlines.  The entire list is terminated by a single
period on a line by itself.  To retrieve one of the items listed in
the menu, the client connects to the Gopher server at the specified
IP address and TCP port number and sends it the selector string.  The
server responds with the item requested, be it a document or another
list of menu items.

Another type of entry that can occur is called an index search
server.  The entry for such a server is similar to those for other
Gopher items, except that to access the search server a Gopher client
is expected to present the selector string and a list of words for
which to search, presumably retrieved from the user.  The index
server then returns a menu of Gopher entries that match the search
criteria.  The index search server was designed to allow full text
searching of the various documents held by a Gopher server, but as we
shall see later it can be put to other uses.

By keeping their protocol very simple and building most of the
intelligence into the client software, the designers of Gopher have
come up with a system that is powerful, yet simple to use and
implement.  Cost of entry into the Gopher world is very low.  It
consumes few resources, the implementations are easy to bring up and
administer, and little effort is required to understand the entire
system.  Furthermore, it uses technology that is mature and
well-understood, making Gopher software development easier.
Virtually anyone with IP connectivity can run a Gopher server and
offer the data they hold to the rest of the world.  Namespace and
data management are user-driven in this sense.  This is both good and
bad.  It's good because it allows users to have control over their
data and the ability to design an information service tailored to
their needs.  They don't have to seek permission from anybody to
start providing a useful service.  It's bad because it does not
always lead to a sensible organization of data and can tend to
produce a somewhat tangled namespace, making it hard to find things.

Overview of X.500

X.500 is the OSI standard for directory service, specified jointly by
the International Standards Organization (ISO) and the Consultative
Committee on International Telephony and Telegraphy (CCITT).  It
specifies a general distributed directory service that assumes that
read operations are far more frequent than writes and that temporary
inconsistencies among data are acceptable.  The model is
client-server based, like Gopher, but more complex.  In X.500,
clients are called Directory User Agents (DUA) and servers are called
Directory System Agents (DSA).  A DUA connects to a ``close'' DSA and
sends its query.  The DSA can either answer the query from its own
data, forward the query to another DSA on behalf of the client (this
process is called _chaining_), or return the address of the other DSA
so the DUA can ask it itself (this process is called _referral_).
Regardless of which DSA a DUA contacts, it sees the same view of the
X.500 data.  DUA to DSA communication is accomplished using the
Directory Access Protocol (DAP).  DSA to DSA communication is over
the Directory System Protocol (DSP).  Both protocols are defined in
terms of the OSI Remote Operations Service [4].

The data itself is composed of _entries_ which are organized into a
hierarchy called the Directory Information Tree (DIT).  At the top of
the tree are entries for countries and international organizations.
Below each country the namespace is decided by the country itself,
with some guidance from the X.500 standard.  In the US, for example,
the North American Directory Forum, a group of public directory
service providers, is defining a namespace that follows the existing
civil naming infrastructure [5].

Each entry in the DIT is composed of a number of _attributes_.  Each
attribute has a _type_ and one or more _values_.  The form the values
can take is determined by the attribute's _syntax_.  An example
attribute used for holding a person's name might have type
_commonName_, syntax _CaseIgnoreString_ (meaning the value is a
string, the case of which is ignored for comparison purposes), and
the two values ``Timothy A Howes'' and ``Tim Howes''.  There is
little restriction on the range of syntaxes and attributes that can
be defined.  Two of the more interesting ones defined in the Internet
X.500 pilot are _jpegPhoto_ and _audio_, which are for holding
pictures and sound, respectively.

Entries are named by one or more of these attribute value pairs, the
collection of which is termed the entry's Relative Distinguished Name
(RDN), and must be unique among all sibling entries.  The globally
unique Distinguished Name (DN) of an entry is formed by concatenating
the path of RDNs from the root of the DIT to the entry.  For example,
the entry for the United States is named _{country=US}_.  The entry
for the University of Michigan is named _{country=US,
organization=University of Michigan}_.

X.500 defines operations that allow a DUA to search and browse the
DIT, retrieve information from particular entries, and even modify,
add and delete entries.  For our purposes, only the read-like
operations are of interest, search in particular.  Searches can be of
a single entry, an entry's children, or an entire subtree of the DIT
(even if the subtree is split across multiple DSAs).  Search filters
are based on boolean combinations of attributes which satisfy certain
conditions, such as equality, approximate equality, substring
matching, etc.  So, for example a search could be made for entries
with a surname equal to ``Howes'' or a commonName approximately equal
to ``Tim Howes''.  For a more complete treatment of X.500, see [6].

The Gateways

The first pass we made at a Gopher to X.500 gateway is known as
_Go500_.  It is tailored to white pages usage and only gives Gopher
users access to a pre-selected portion of the X.500 DIT.  It allows
subtree searching of this fixed portion of the namespace, but no
browsing.  It appears as an index search server to a Gopher client,
which prompts the user to enter keywords for which to search.
_Go500_ takes the keywords, forms an X.500 search filter, and
performs a subtree search of the portion of the DIT for which it is
configured.  A list of entries matching the search criteria is
returned to the Gopher client, each entry represented as a text
document.  When the user chooses one of these documents, the
corresponding entry's attributes are retrieved, converted to text
form, and displayed to the user.  Various white pages attributes are
displayed.  This gateway worked well to solve our initial problem,
which was to provide a faculty, staff and student phone directory
through Gopher without duplicating the information we already held in
our X.500 database.  _Go500_ is not a general gateway, though, and
gives Gopher users no way to access X.500 databases at other
organizations, nor does it allow browsing of the X.500 namespace,
something Gopher users are fond of doing.

[[ about 3 inches of space needed here ]]
Figure 1: Root menu returned from the gateway

To solve this problem and provide more general X.500 access, we
developed _Go500gw_.  _Go500gw_ appears to the unsuspecting Gopher
client as just another Gopher server.  The initial Gopher ``menu''
that is exported to the client consists of the list of X.500 entries
at the root of the DIT.  Leaf entries in X.500 appear as text
documents in Gopher.  Nonleaf entries appear as other Gopher menu
servers (though they in fact point to the single _Go500gw_ server,
but with a different selector string).  Figure 1 shows a portion of
an example root menu returned by the gateway, as displayed by a
popular Macintosh Gopher client.  If a Gopher user selects one of the
leaf, or document objects, _Go500gw_ retrieves the contents of the
entry, converts it to text form and sends it back to the Gopher
client.  If a Gopher user selects one of the nonleaf, or menu
objects, _Go500gw_ retrieves a list of the entry's children and sends
it back to the Gopher client.  In this list, nonleaf children are
presented as Gopher menus and leaf children as documents, just as for
the initial list.  In both cases the selector string is the
text-encoded form of the entry's Distinguished Name, prefixed with a
one character flag indicating whether a ``list'' or ``read''
operation is required.  The ``list'' operation is used for menus,
``read'' for documents.  Using this simple scheme, a user can descend
to any point in the X.500 DIT and view the contents of any leaf
entry.

To avoid clutter in the namespace, entries for DSAs are excluded from
lists sent back to the client.  Also, certain attributes not likely
to be of interest to the user are not displayed (e.g., those dealing
with X.500 knowledge references and access control).  To help users
find their way, each entry is identified by its primary object class
in the directory when it is displayed, for example _person_ or
organization.  This information is simply included in the
user-visible name of the object.  This way, users can easily tell
organizations from states or localities, and people from mailing
lists or application processes.

At each level _Go500gw_ also exports two special entries to the
Gopher client.  One is labelled _Read <name> entry_, where _<name>_
is the X.500 Relative Distinguished Name of the entry marking the
current position in the DIT\footnote{Since there is no real ``root''
entry in X.500, this item does not appear in the top level menu.}.
This allows Gopher users to retrieve the attributes of nonleaf
entries as well as leaf entries.  The second is labelled _Search
<name>_, and allows Gopher users to be more selective in their
browsing by using the X.500 search facilities.  This entry appears as
a Gopher index search server.  _Go500gw_ takes what the user types,
forms and executes an X.500 search query, and returns the list of
results to the Gopher client in the same form as described above
(menus for nonleaf entries, text documents for leaf entries).  The
exact form of the query applied to X.500 depends on the user's
current position in the DIT and on what the user types.  The gateway
assumes that searches initiated higher up in the tree are looking for
countries, states, localities or organizations.  In this case, a one
level search is performed, with a filter appropriate for finding such
objects.  Searches initiated lower down in the tree (e.g., below an
organization entry) are assumed to be for other objectcs, like
people.  In this case, a subtree search is called for, with a
slightly different filter, more suitable for finding people or other
objects.  Figure 2 shows how an example leaf entry is displayed.

[[ about 4 inches of space needed here]]
Figure 2: An example X.500 entry as displayed by the gateway

An early version of the gateway allowed users to specify their choice
of one level or subtree search and required them to specify the X.500
search filter directly.  Experience showed this to be more confusing
than helpful to people, despite the added flexibility.  In practice,
the searching assumptions described above seem to work most of the
time, without requiring Gopher users to know anything about X.500.

Implementation Experience

Implementation of the gateways was surprisingly easy.  The initial
version of each required less than a day's programming time and well
under 1000 lines of C code.  Credit for this happy surprise belongs
equally to Gopher itself, which is an extremely simple and easy to
work with protocol, and the Lightweight Directory Access Protocol
(LDAP), which is what the gateway uses to talk to X.500.  Gopher is
simple enough that a reasonably complete understanding of the
protocol can be gained in less than an hour by reading the Gopher
paper and using telnet to poke at a Gopher server or two.  Gopher is
also very flexible, despite its simplicity.  A surprising amount can
be accomplished through clever use of the selector string.

The other half of the credit goes to LDAP, which provides access to
most X.500 capabilities directly over TCP.  LDAP uses simplified
string encodings for many protocol elements, greatly relieving the
encoding and decoding burden clients must bear.  The University of
Michigan implementation of LDAP includes a client library and simple
API making it relatively easy to develop LDAP clients (like the
Gopher gateways), without all the usual baggage associated with an
OSI application.  LDAP is currently a proposed Internet Standard [2,
7].

Gateway Usage

The simple Gopher to X.500 gateway, _Go500_, has been in use at the
University of Michigan for nearly a year, providing our Gopher users
with access to the X.500-based faculty, staff and student directory.
Initial growth was substantial but has now slowed.  _Go500_ currently
handles between 1000 and 2000 connections per day.  The vast majority
of these connections originate from the University of Michigan
campus.  The more general gateway, _Go500gw_, has been in operation
for about six months and is listed under the University of Michigan
main Gopher menu as well as at the ``Mother of all Gophers'' at the
University of Minnesota.  Its usage has grown dramatically to over
3000 connections per day, making it the single largest user of our
X.500 service.  Together, the two gateways send over 5000 queries a
day to our X.500 server.  Usage was so great, in fact, that we
started another DSA solely for the purpose of handling the _Go500gw_
traffic.  A number of other sites are also now running _Go500_,
_Go500gw_, or both, making it difficult to know how much usage the
gateways get over all.

Future Work

There are a number of small improvements to be made to the gateways,
mostly in the areas of error handling, which is currently rather
minimal, and search heuristics, which could use some tuning.  The
current gateway provides Gopher users access to X.500.  The reverse
function, giving X.500 users access to Gopher, would make an
interesting project, though one somewhat more complicated.  X.500
attribute types and object classes must be defined to accomodate the
Gopher data types.  Some sensible mapping from X.500 operations to
Gopher operations must also be defined, keeping in mind that there
are not analogous operations in many cases.  Some Gopher types are
not representable at all in the X.500 world, for example the telnet
type, which identifies a server to which the Gopher client is
supposed to telnet and provide interactive access for the user.

Another approach would be to register individual Gopher servers in
X.500, giving access information that would be understandable by a
Gopher client.  Although this would not help X.500 clients access
data contained in Gopher servers, it would allow Gopher clients
accessing the gateway to make use of the X.500 naming and searching
capabilities in locating Gopher servers.  This approach would be
relatively straightforward to implement.

Conclusion

Certainly based on usage alone, the Gopher to X.500 gateways have
been very successful.  The relative ease of implementation has been
an especially nice bonus.  More interesting is what the gateway's
success suggests about Gopher, X.500, and their users.  The lesson
about Gopher users seems to be that they are curious creatures.
Browsing seems to be the most common use of and way of discovering
the gateways.  If there is a lesson for the X.500 community it is
that simplicity and ease of use and installation are big pluses when
it comes to getting a protocol out there and used by the community at
large.  The use of the gateways clearly shows that users are
interested in the data and services provided by X.500.  If they have
a simple, easy to use tool on their desktop with which to access
X.500, they will use it.  It is only by providing such tools (and
correspondingly simple configuration and administration on the server
side) that X.500 will continue to grow in the Internet into a more
widely-used, mature service.  Gopher, on the other hand, has enjoyed,
or perhaps suffered from, explosive growth over the past couple of
years.  Its future success depends in part on how well it evolves to
handle the scaling effects it is now beginning to feel.

Availability

The University of Minnesota Gopher distribution is available for
anonymous FTP from the host boombox.micro.umn.edu.  The two Gopher to
X.500 gateways described in this paper are available as part of the
University of Michigan LDAP distribution, available for anonymous FTP
from the host terminator.rs.itd.umich.edu.

Bibliography

[1] Bob Alberti, Farhad Anklesaria, Paul Lindner, Mark McCahill, and
Daniel Torrey, ``The internet Gopher protocol: a distributed document
search and retrieval protocol'', University of Minnesota
Microcomputer and Workstation Networks Center, Spring 1991.

[2] Tim Howes, Steve Hardcastle-Kille, Wengyik Yeong, and Colin
Robbins, ``The String Representation of Standard Attribute
Syntaxes'', Intenet Draft, December 1992.

[3] International Standards Organization, Information Processing
Systems - Open Systems Interconnection - The Directory, International
Standard 9594.

[4] International Standards Organization, Information Processing
Systems - Text Communications - Remote Operations, Part 1: Model,
Notation and Service Definition, International Standard 9072-1.

[5] North American Directory Forum, ``An X.500 Naming Scheme for
National DIT Subtrees and its Application to c=US'', Standing
Document 5.

[6] Marshall T. Rose, ``The Little Black Book:  Mail Bonding with OSI
Directory Service'', Prentice Hall 1991.

[7] Wengyik Yeong, Tim Howes, and Steve Hardcastle-Kille,
``Lightweight Directory Access Protocol'', Internet Draft, December
1992.

About the Author

TIMOTHY A HOWES holds a B.S.E and M.S.E from the University of
Michigan.  Since 1989 he has worked for the U-M Information
Technology Division on a variety of Unix, TCP/IP, and OSI network
programming and protocol design projects.  He is currently in charge
of X.500 development and deployment on the U-M campus.  He is
co-chair of the IETF Integrated Directory Services working group, and
member of the ACM and IEEE.  He can be reached as tim@umich.edu.