The Gopher to X.500 Gateway Timothy A Howes University of Michigan Abstract This document describes the Gopher to X.500 gateway software developed at the University of Michigan. Gopher is a simple distributed document search and retrieval protocol for the Internet. X.500 is the OSI standard for distributed directory service. The Gopher to X.500 gateway software gives Gopher users access to X.500 transparently through their Gopher user agents. The Gopher to X.500 gateway services at U-M currently handle over 5000 connections per day, making it the single largest user of X.500 services at U-M. Introduction The Gopher and X.500 protocols are very different. Gopher was developed at the University of Minnesota and originally described by a short six page paper [1]. Simplicity is one of Gopher's key objectives. It is basically a navigation and retrieval protocol, imposing very little structure on the data ultimately retrieved or the namespace in which it lives. The Gopher namespace forms a general graph structure, though in practice this is often restricted to a hierarchy. No facilities are provided in the protocol for modification or replication of data. X.500, on the other hand, is quite complicated by any standard. Developed by an ISO and CCITT committee, it is described by an International Standard [3] several hundred pages long. X.500 is a directory service protocol meant to provide navigation, naming, data storage and retrieval, and authentication capabilities. It defines a strict information framework that requires data to be well-structured and strongly typed. The X.500 namespace is hierarchical, reflecting the political and geographical boundaries of countries, states, localities, organizations, etc. Aliases can be used to provide a more general graph structure. Despite their differences, tying Gopher and X.500 together has proved to be surprisingly easy. This paper describes _Go500_ and _Go500gw_, two Gopher to X.500 gateways developed at the University of Michigan. Since the gateways were deployed and announced at a Gopher developers conference, combined usage has steadily increased to over 5000 connections per day, making them the biggest users of the University of Michigan X.500 service. In addition, a number of other sites are now running gateways of their own. The first two sections and give brief overviews of Gopher and X.500, respectively. The next sections describe the gateways themselves, followed by a few comments about our implementation experiences, and a look at the gateway usage in more detail. Finally, we consider possible future work in this area, followed by information on how to get the gateway software, which is freely available on the Internet. Overview of Gopher The Gopher protocol is based on a client-server model in which Gopher servers hold documents which may be accessed by users running Gopher clients. Only navigation and retrieval facilities are provided by Gopher. The protocol makes no provision for the modification of data. The model resembles a hierarchical filesystem, with both files (documents) and directories (menus). Menus contain lists of documents and/or other menus. Gopher documents can contain text, sound, pictures, etc. Other types of menu entries are also possible, e.g., index search servers. Each item listed in a Gopher menu has a type (document, menu, etc.), a name which is usually displayed to the user, a _selector string_ used internally by the client and server to uniquely identify the item, and the IP address and TCP port number of the Gopher server holding the item. These pieces of information are tab-separated when presented to a client. A client retrieves an initial list of menu items by connecting to a Gopher server and sending it a null line. The Gopher server responds with the list of items in its ``root'' menu. Menu items are separated by newlines. The entire list is terminated by a single period on a line by itself. To retrieve one of the items listed in the menu, the client connects to the Gopher server at the specified IP address and TCP port number and sends it the selector string. The server responds with the item requested, be it a document or another list of menu items. Another type of entry that can occur is called an index search server. The entry for such a server is similar to those for other Gopher items, except that to access the search server a Gopher client is expected to present the selector string and a list of words for which to search, presumably retrieved from the user. The index server then returns a menu of Gopher entries that match the search criteria. The index search server was designed to allow full text searching of the various documents held by a Gopher server, but as we shall see later it can be put to other uses. By keeping their protocol very simple and building most of the intelligence into the client software, the designers of Gopher have come up with a system that is powerful, yet simple to use and implement. Cost of entry into the Gopher world is very low. It consumes few resources, the implementations are easy to bring up and administer, and little effort is required to understand the entire system. Furthermore, it uses technology that is mature and well-understood, making Gopher software development easier. Virtually anyone with IP connectivity can run a Gopher server and offer the data they hold to the rest of the world. Namespace and data management are user-driven in this sense. This is both good and bad. It's good because it allows users to have control over their data and the ability to design an information service tailored to their needs. They don't have to seek permission from anybody to start providing a useful service. It's bad because it does not always lead to a sensible organization of data and can tend to produce a somewhat tangled namespace, making it hard to find things. Overview of X.500 X.500 is the OSI standard for directory service, specified jointly by the International Standards Organization (ISO) and the Consultative Committee on International Telephony and Telegraphy (CCITT). It specifies a general distributed directory service that assumes that read operations are far more frequent than writes and that temporary inconsistencies among data are acceptable. The model is client-server based, like Gopher, but more complex. In X.500, clients are called Directory User Agents (DUA) and servers are called Directory System Agents (DSA). A DUA connects to a ``close'' DSA and sends its query. The DSA can either answer the query from its own data, forward the query to another DSA on behalf of the client (this process is called _chaining_), or return the address of the other DSA so the DUA can ask it itself (this process is called _referral_). Regardless of which DSA a DUA contacts, it sees the same view of the X.500 data. DUA to DSA communication is accomplished using the Directory Access Protocol (DAP). DSA to DSA communication is over the Directory System Protocol (DSP). Both protocols are defined in terms of the OSI Remote Operations Service [4]. The data itself is composed of _entries_ which are organized into a hierarchy called the Directory Information Tree (DIT). At the top of the tree are entries for countries and international organizations. Below each country the namespace is decided by the country itself, with some guidance from the X.500 standard. In the US, for example, the North American Directory Forum, a group of public directory service providers, is defining a namespace that follows the existing civil naming infrastructure [5]. Each entry in the DIT is composed of a number of _attributes_. Each attribute has a _type_ and one or more _values_. The form the values can take is determined by the attribute's _syntax_. An example attribute used for holding a person's name might have type _commonName_, syntax _CaseIgnoreString_ (meaning the value is a string, the case of which is ignored for comparison purposes), and the two values ``Timothy A Howes'' and ``Tim Howes''. There is little restriction on the range of syntaxes and attributes that can be defined. Two of the more interesting ones defined in the Internet X.500 pilot are _jpegPhoto_ and _audio_, which are for holding pictures and sound, respectively. Entries are named by one or more of these attribute value pairs, the collection of which is termed the entry's Relative Distinguished Name (RDN), and must be unique among all sibling entries. The globally unique Distinguished Name (DN) of an entry is formed by concatenating the path of RDNs from the root of the DIT to the entry. For example, the entry for the United States is named _{country=US}_. The entry for the University of Michigan is named _{country=US, organization=University of Michigan}_. X.500 defines operations that allow a DUA to search and browse the DIT, retrieve information from particular entries, and even modify, add and delete entries. For our purposes, only the read-like operations are of interest, search in particular. Searches can be of a single entry, an entry's children, or an entire subtree of the DIT (even if the subtree is split across multiple DSAs). Search filters are based on boolean combinations of attributes which satisfy certain conditions, such as equality, approximate equality, substring matching, etc. So, for example a search could be made for entries with a surname equal to ``Howes'' or a commonName approximately equal to ``Tim Howes''. For a more complete treatment of X.500, see [6]. The Gateways The first pass we made at a Gopher to X.500 gateway is known as _Go500_. It is tailored to white pages usage and only gives Gopher users access to a pre-selected portion of the X.500 DIT. It allows subtree searching of this fixed portion of the namespace, but no browsing. It appears as an index search server to a Gopher client, which prompts the user to enter keywords for which to search. _Go500_ takes the keywords, forms an X.500 search filter, and performs a subtree search of the portion of the DIT for which it is configured. A list of entries matching the search criteria is returned to the Gopher client, each entry represented as a text document. When the user chooses one of these documents, the corresponding entry's attributes are retrieved, converted to text form, and displayed to the user. Various white pages attributes are displayed. This gateway worked well to solve our initial problem, which was to provide a faculty, staff and student phone directory through Gopher without duplicating the information we already held in our X.500 database. _Go500_ is not a general gateway, though, and gives Gopher users no way to access X.500 databases at other organizations, nor does it allow browsing of the X.500 namespace, something Gopher users are fond of doing. [[ about 3 inches of space needed here ]] Figure 1: Root menu returned from the gateway To solve this problem and provide more general X.500 access, we developed _Go500gw_. _Go500gw_ appears to the unsuspecting Gopher client as just another Gopher server. The initial Gopher ``menu'' that is exported to the client consists of the list of X.500 entries at the root of the DIT. Leaf entries in X.500 appear as text documents in Gopher. Nonleaf entries appear as other Gopher menu servers (though they in fact point to the single _Go500gw_ server, but with a different selector string). Figure 1 shows a portion of an example root menu returned by the gateway, as displayed by a popular Macintosh Gopher client. If a Gopher user selects one of the leaf, or document objects, _Go500gw_ retrieves the contents of the entry, converts it to text form and sends it back to the Gopher client. If a Gopher user selects one of the nonleaf, or menu objects, _Go500gw_ retrieves a list of the entry's children and sends it back to the Gopher client. In this list, nonleaf children are presented as Gopher menus and leaf children as documents, just as for the initial list. In both cases the selector string is the text-encoded form of the entry's Distinguished Name, prefixed with a one character flag indicating whether a ``list'' or ``read'' operation is required. The ``list'' operation is used for menus, ``read'' for documents. Using this simple scheme, a user can descend to any point in the X.500 DIT and view the contents of any leaf entry. To avoid clutter in the namespace, entries for DSAs are excluded from lists sent back to the client. Also, certain attributes not likely to be of interest to the user are not displayed (e.g., those dealing with X.500 knowledge references and access control). To help users find their way, each entry is identified by its primary object class in the directory when it is displayed, for example _person_ or organization. This information is simply included in the user-visible name of the object. This way, users can easily tell organizations from states or localities, and people from mailing lists or application processes. At each level _Go500gw_ also exports two special entries to the Gopher client. One is labelled _Read entry_, where __ is the X.500 Relative Distinguished Name of the entry marking the current position in the DIT\footnote{Since there is no real ``root'' entry in X.500, this item does not appear in the top level menu.}. This allows Gopher users to retrieve the attributes of nonleaf entries as well as leaf entries. The second is labelled _Search _, and allows Gopher users to be more selective in their browsing by using the X.500 search facilities. This entry appears as a Gopher index search server. _Go500gw_ takes what the user types, forms and executes an X.500 search query, and returns the list of results to the Gopher client in the same form as described above (menus for nonleaf entries, text documents for leaf entries). The exact form of the query applied to X.500 depends on the user's current position in the DIT and on what the user types. The gateway assumes that searches initiated higher up in the tree are looking for countries, states, localities or organizations. In this case, a one level search is performed, with a filter appropriate for finding such objects. Searches initiated lower down in the tree (e.g., below an organization entry) are assumed to be for other objectcs, like people. In this case, a subtree search is called for, with a slightly different filter, more suitable for finding people or other objects. Figure 2 shows how an example leaf entry is displayed. [[ about 4 inches of space needed here]] Figure 2: An example X.500 entry as displayed by the gateway An early version of the gateway allowed users to specify their choice of one level or subtree search and required them to specify the X.500 search filter directly. Experience showed this to be more confusing than helpful to people, despite the added flexibility. In practice, the searching assumptions described above seem to work most of the time, without requiring Gopher users to know anything about X.500. Implementation Experience Implementation of the gateways was surprisingly easy. The initial version of each required less than a day's programming time and well under 1000 lines of C code. Credit for this happy surprise belongs equally to Gopher itself, which is an extremely simple and easy to work with protocol, and the Lightweight Directory Access Protocol (LDAP), which is what the gateway uses to talk to X.500. Gopher is simple enough that a reasonably complete understanding of the protocol can be gained in less than an hour by reading the Gopher paper and using telnet to poke at a Gopher server or two. Gopher is also very flexible, despite its simplicity. A surprising amount can be accomplished through clever use of the selector string. The other half of the credit goes to LDAP, which provides access to most X.500 capabilities directly over TCP. LDAP uses simplified string encodings for many protocol elements, greatly relieving the encoding and decoding burden clients must bear. The University of Michigan implementation of LDAP includes a client library and simple API making it relatively easy to develop LDAP clients (like the Gopher gateways), without all the usual baggage associated with an OSI application. LDAP is currently a proposed Internet Standard [2, 7]. Gateway Usage The simple Gopher to X.500 gateway, _Go500_, has been in use at the University of Michigan for nearly a year, providing our Gopher users with access to the X.500-based faculty, staff and student directory. Initial growth was substantial but has now slowed. _Go500_ currently handles between 1000 and 2000 connections per day. The vast majority of these connections originate from the University of Michigan campus. The more general gateway, _Go500gw_, has been in operation for about six months and is listed under the University of Michigan main Gopher menu as well as at the ``Mother of all Gophers'' at the University of Minnesota. Its usage has grown dramatically to over 3000 connections per day, making it the single largest user of our X.500 service. Together, the two gateways send over 5000 queries a day to our X.500 server. Usage was so great, in fact, that we started another DSA solely for the purpose of handling the _Go500gw_ traffic. A number of other sites are also now running _Go500_, _Go500gw_, or both, making it difficult to know how much usage the gateways get over all. Future Work There are a number of small improvements to be made to the gateways, mostly in the areas of error handling, which is currently rather minimal, and search heuristics, which could use some tuning. The current gateway provides Gopher users access to X.500. The reverse function, giving X.500 users access to Gopher, would make an interesting project, though one somewhat more complicated. X.500 attribute types and object classes must be defined to accomodate the Gopher data types. Some sensible mapping from X.500 operations to Gopher operations must also be defined, keeping in mind that there are not analogous operations in many cases. Some Gopher types are not representable at all in the X.500 world, for example the telnet type, which identifies a server to which the Gopher client is supposed to telnet and provide interactive access for the user. Another approach would be to register individual Gopher servers in X.500, giving access information that would be understandable by a Gopher client. Although this would not help X.500 clients access data contained in Gopher servers, it would allow Gopher clients accessing the gateway to make use of the X.500 naming and searching capabilities in locating Gopher servers. This approach would be relatively straightforward to implement. Conclusion Certainly based on usage alone, the Gopher to X.500 gateways have been very successful. The relative ease of implementation has been an especially nice bonus. More interesting is what the gateway's success suggests about Gopher, X.500, and their users. The lesson about Gopher users seems to be that they are curious creatures. Browsing seems to be the most common use of and way of discovering the gateways. If there is a lesson for the X.500 community it is that simplicity and ease of use and installation are big pluses when it comes to getting a protocol out there and used by the community at large. The use of the gateways clearly shows that users are interested in the data and services provided by X.500. If they have a simple, easy to use tool on their desktop with which to access X.500, they will use it. It is only by providing such tools (and correspondingly simple configuration and administration on the server side) that X.500 will continue to grow in the Internet into a more widely-used, mature service. Gopher, on the other hand, has enjoyed, or perhaps suffered from, explosive growth over the past couple of years. Its future success depends in part on how well it evolves to handle the scaling effects it is now beginning to feel. Availability The University of Minnesota Gopher distribution is available for anonymous FTP from the host boombox.micro.umn.edu. The two Gopher to X.500 gateways described in this paper are available as part of the University of Michigan LDAP distribution, available for anonymous FTP from the host terminator.rs.itd.umich.edu. Bibliography [1] Bob Alberti, Farhad Anklesaria, Paul Lindner, Mark McCahill, and Daniel Torrey, ``The internet Gopher protocol: a distributed document search and retrieval protocol'', University of Minnesota Microcomputer and Workstation Networks Center, Spring 1991. [2] Tim Howes, Steve Hardcastle-Kille, Wengyik Yeong, and Colin Robbins, ``The String Representation of Standard Attribute Syntaxes'', Intenet Draft, December 1992. [3] International Standards Organization, Information Processing Systems - Open Systems Interconnection - The Directory, International Standard 9594. [4] International Standards Organization, Information Processing Systems - Text Communications - Remote Operations, Part 1: Model, Notation and Service Definition, International Standard 9072-1. [5] North American Directory Forum, ``An X.500 Naming Scheme for National DIT Subtrees and its Application to c=US'', Standing Document 5. [6] Marshall T. Rose, ``The Little Black Book: Mail Bonding with OSI Directory Service'', Prentice Hall 1991. [7] Wengyik Yeong, Tim Howes, and Steve Hardcastle-Kille, ``Lightweight Directory Access Protocol'', Internet Draft, December 1992. About the Author TIMOTHY A HOWES holds a B.S.E and M.S.E from the University of Michigan. Since 1989 he has worked for the U-M Information Technology Division on a variety of Unix, TCP/IP, and OSI network programming and protocol design projects. He is currently in charge of X.500 development and deployment on the U-M campus. He is co-chair of the IETF Integrated Directory Services working group, and member of the ACM and IEEE. He can be reached as tim@umich.edu.