A Gopher Interface to SQL Relational Databases

Paul Lindner
Jr. Applications Programmer
190 Shepherd Labs, University of Minnesota
100 Union St. SE
Minneapolis, MN 55455
lindner@boombox.micro.umn.edu
+1 (612)-625-1300
+1 (612)-625-6817

Abstract:



1. Introduction

The Internet Gopher is a suite of software that allows for easy access
to network-based information.  Initially developed at the University
of Minnesota in early 1991, it has spread to over 1600 sites worldwide
as of July 1993.

The Gopher system is a client-server system that can be used to build
a Campus Wide Information System (CWIS).  Clients, which browse and
search inforamtion are available for most major platforms (Macintosh,
DOS, Windows, Unix, VMS, MVS, VM/CMS, OS/2).  Servers, which translate
and publish information, are also available for all of the platforms
mentioned above.

This client-server architecture uses the Internet Gopher Protocol
[Gopher91,rfc1436].  The Gopher protocol has been described as "brutally
simple."  It is based on a web/tree metaphor of files and directories.
Its basic primitives are a list directory transaction, a retrieve file
transaction and a search for directory entries transaction.

Given this design it isn't suprising to see that most implementations
of Gopher Servers map a filesystem to GopherSpace.  An example of a
file system hierarchy would be the registry of all gopher servers.
The registy is divided into a number of directories based on
geography.  At the top level is the directory of continents, then
there are subdirectories of countrys for each continent, then a state
or province subdirectory.  After all of this digging you will find the
information you want.

For most data this approach works rather well, a good portion of the
data people want to publish are in files located in directories.
However there is a whole class of data that doesn't fit into a file
system that easily: databases, especially relational databases.

To handle this type of data I've developed a gateway that translates
gopher requests to SQL databases.  This gateway allows a Gopher user
to look at the data inside of SQL tables using the gopher
browse/search metaphor.  It simplifies the allowable operations on the
database to a limited, yet useful subset of the allowable SQL queries.


2. Features of the SQL Gateway

The SQL gateway allows people using a Gopher Client to access the data
contained in an SQL database without having to know SQL.  The gateway
is the only portion of code that needs to know any SQL.  The clients
can be used as is with the gateway, no modifications for the clients
are needed.

The SQL gateway accepts gopher requests and translates them into
SQL requests that get passed via TCP to either a Sybase Database
Server or an Oracle Database server.

The SQL gateway allows the Gopher Client to:

 o   View the tables of a database as a Gopher directory
 o   View the columns of a given table as a Gopher directory
 o   View the contents of a column as a Gopher directory
 o   View records as formatted text.
 o   View/import records as tab-separated-values
 o   Add records to a table.
 o   Search the table by filling out a Gopher+ form.

The server/gateway administrator has control over the configuration.
The administrator can give column/table names more descriptive titles,
and can link columns together to make a subdirectories via implicit joins.


3. Semantics of a Gateway

A gateway is a simple thing.  It translates commands and data from one
format to another.  Gopher has long used gateways to lash together
disparate information systems such as USENET, Archie, X.500, FTP, WAIS
and others.  In fact it's been referred to as the "Duct Tape of the
Internet" by some.

The SQL gateway translates Gopher Operations into SQL statements.  We
do lose functionality when doing this however.  The large set of
possible operations in SQL would be hard to present using the very
simple Gopher protocol operations.  (It might be possible with a large
enough directory tree though..)


2.1 Semantics of the Gopher Protocol

The Gopher Protocol is very simple information retrieval tool based on
the client-server model. It uses three basic transactions/commands.

  o  Directory listing
  o  File retrieval
  o  Search and return a directory listing

These simple directives are quite powerful, over 1500 sites over the
world now use the base Gopher protocol.  In the Spring of 1992 we
proposed a suite of upward/downward compatible extensions to the
Gopher protocol called "Gopher+".  Gopher+ adds the following features
to the base gopher protocol:

  o Forms input
  o Multiple alternate document representations (VIEWS)
  o Metainformation about an object (administrator, size, etc..)

The SQL Gateway uses the Gopher+ protocol for some of it's features:
optional tab-separated documents and solicitation of searchs via a
form.


2.2 Sematics of SQL.

SQL is a full featured database access language that can do just about
anything you want to your data.  In fact it's a relational calculus.
Some of the more common operations are:

  o  "select"ing records from a table
  o  "insert"ing records into a table
  o  searching/querying multiple tables for information
  o  ordering of results
  o  grouping of results
  o  creation/deletion of tables or views

  etc.


2.3 Semantics of the SQL Gateway

The SQL gateway understands a small limited number of commands.  These
commands perform the mapping between Gopher operations and SQL
statements.

  o  Listing of tables
  o  Listing of columns in specific tables
  o  List of distinct values in a specific column
  o  Display of records from a search
  o  Insertion of an record



2.4  Mapping operations between different systems

Any gateway maps between multiple sets of available operations and
data formats.  The Gopher to SQL gateway maps gopher operations
(selecting a directory, choosing a file) into it's internal command
set ("tables" command, "get" command) and then into SQL (select
tables from dictionary, select * from tablename).  

It then translates the results received from the SQL query back down
the chain.  For instance the results of a "select table_name from
dictionary" would generate intermediate gateway operations "columns
tablename" and finally gopher data format (the gopher data-directory
format) 

The following table summarizes the equivilancy between the different
command sets.

Gateway Command		Gopher Operation	SQL Statement
---------------		----------------	----------------
tables			Directory		select table_name from dictionary
columns			Directory		select column_name from dictionary where tablename=<table>
list			Directory		select distinct table.column from table
get			Text			select * from table where query


3.   A Sample Session

Here is what a user of the gateway would see when connecting to the
gateway.  In this example we're using the PUBS2 database running on a
Sybase Database Server.




4.   Setting up a Gateway.


5.   Sample Databases

We have been using this gateway to provide access to a number of
databases at the University of Minnesota.  We detail the setup of some
of them here.

5.1 The Sample "PUBS2" database

The Sybase Database comes with a sample database of authors, titles,
publishes and stores called "pubs2".  This database demonstrates a
number of features, including 1:N and N:N relations.




5.2 The Financial Detail Reporting Database

This database is considerably more complex than the PUBS2 database.
This database contains all the financial transactions of the
University of Minnesota.  This is one _big_ database!  The monthly
transactions are in the four to five hundred thousand range!

We set up a menu system to do specific queries to this database.
This keeps users from running inefficiently formed queries.  The menu
structure was created on a Unix machine running the Unix Gopher Server
software.  This server also provides explanations about what all the
different code numbers mean.  All of the SQL gateway functionality is
still in the SQL Gateway.  The Unix server makes "links" to the SQL
Gateway, using it's functionality at specific points in the hierarchy.

This is the tree listing of the 

  About The Financial Reporting Database
  Terms used in the Financial Reporting Database
  Areas and Organizations/

  Balance Sheets/
    List of Balance Sheets by Area/
      0510
      0620
      ...
  This Months Transactions/
    




6. Technical Details

To run the server you will need some special software.

If using Sybase:

You will need a machine with the Sybase client libraries (usually
stored in /usr/sybase) and a special version of perl called sybperl
[Peppler92].

If using Oracle:

You will need a machine with the Oracle client libraries and a special
version of perl called oraperl [Stock93].

Each of these uses specific "glue" routines to implement database
specific features (connecting, data dictionary, etal).  Thus it is
fairly easy to extend the gateway to deal with other database vendors.


6.1 Command line options understood by the gateway

Major parameters are set via command line switches.  The following
table summarizes them:

Option		 Description
------		 -----------
-h		 Hostname of the gateway
-p		 Port number of the gateway
-T		 Database type (oracle, sybase, etc.)
-S               SQL server to connect to.
-D               Database to use
-U		 Username to use
-P		 Password to use


6.2 Selectors understood by the Gateway...

The gateway implements a small internal command set.  The following
summarizes this small command language.

o   tables [<search>]

    This command uses the data dictionary to make a gopher directory
    list of the different tables in the database.  This is most useful
    for allowing raw database access.  This command generates
    "columns" commands that the client will execute.

    An optional search item will restrict the tables to match the
    search term.


o   columns <tablename> [<search>]
    
    This command uses the data dictionary to make a gopher directory
    list of the columns in <tablename>.  This command generates "list"
    commands for each column in the table.  It also generates a "get"
    command that will retrieve all the records of a table as text.

    After the column names are presented, a list of search items is
    presented.  If the client is using Gopher+ it gets a list of
    forms.

    If the user can write to the table an "insert" item for adding an
    item to the database is added to the menu.

    An optional search item will restrict the tables to match the
    search term.


o   list <tablename.columnname> [<fromtables> [<query>]] [<search>]

    This command generates a listing of the unique items in the given
    table and column.

    This command can be part of a multiple series of queries by
    specifying an optional list of tables to choose from and an query.
    This gets translated into SQL that looks like this:

      select .....
      from  <fromtables>
      where <query>

    An optional search can search for specific titles in the specified
    table and column.


o   get <tablename.columnname> <fromtables> <query>

    This command actually retrieves a record from the database as a
    textual item.  The default is to print out each column name, a
    colon and the data contained in it.  Multiple records are
    separated with a line of "dashes".

    Optionally one may define a module for a specific table.  This
    module is a file containing perl code that can do sub-queries and
    fancy reformatting of the data into any format you desire.

o   insert <tablename>

    This command inserts a new record into the specified table.  The
    values for the table come from the ASK Block.

6.2 Files used by the Gateway

 o  namelist.<databasename>

    This file is basically a local table of mappings between
    column/table names and Gopher Title Names.  The format is:

     <table>:<Gopher Table Name>
     <table.column>:<Gopher Column Name>
     <.Column>:<Gopher Column Name>

    Eventually this table could be kept on the database itself,
    obliviating the need for a local table.

 o jointable.<databasename>

    This file allows you to link together two tables, allowing you to
    do implicit joins between the two tables.

    The format is as follows:

     <source-table.column>:<target-table.column>:<additional_tables>:<join query>

 o <tablename>.module

    Files in this format can override the default record display
    routines.  Thus, if you want to join data in with the text, this
    is one way to do it.

    For instance, in the PUBS2 Database the author table display
    record is overridden so we can display the address correctly, and
    also join in the au_blurbs table.



References:


[Gopher91] University of Minnesota Gopher Team,
           (gopher@boombox.micro.umn.edu) The Internet Gopher Protocol
           {URL=gopher://boombox.micro.umn.edu:70/0/0/gopher/gopher_protocol/}

[Gopher92] University of Minnesota Gopher Team,
           (gopher@boombox.micro.umn.edu) Gopher+, proposed
           enhancements to the internet Gopher protocol
           {URL=gopher://boombox.micro.umn.edu:70/0/0/gopher/gopher_protocol/}

[rfc1436] University of Minnesota Gopher Team,
          (gopher@boombox.micro.umn.edu) The Internet Gopher Protocol

[Lindner93]  Lindner, P. (lindner@boombox.micro.umn.edu), Gopher and
             Relational Databases, an Interface to Metalbase.
             Proceedings of GopherCon '93 

[Peppler92] Michael Peppler (mpeppler@itf.ch), sybperl - Sybase
            DB-library extensions to Perl
           {URL=ftp://ftp.uu.net/usenet/comp.sources.misc/volume30/sybperl}

[Stock93] Kevin Stock (kstock@encore.com), Oraperl-v2 -
          Extensions to Perl to access Oracle databases.  USENET
          posting in comp.sources.misc
          {URL=ftp://ftp.uu.net/usenet/comp.sources.misc/volume38/oraperl-v2/}

[ ] Sybase

[ ] Oracle
