ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) NAME archie - Internet archive server listing service SYNOPSIS archie DESCRIPTION The _a_r_c_h_i_e system allows the user to query a database con- taining a list of software which is available on hosts con- nected to the Internet network. For hosts connected to the Internet, software located through this service can be obtained by means of _f_t_p(1); otherwise, for hosts with access to BITNET/NetNorth/EARN, it can be obtained by elec- tronic mail through the Princeton _b_i_t_f_t_p (_1_L) service. The system can be accessed in an interactive fashion or via electronic mail. Using the Interactive Interface In order to use the interactive system: 1) Connect to host archie.mcgill.ca (132.206.2.3 or 132.206.51.1) with _t_e_l_n_e_t(1). 2) Login as user archie (no capitals, no password required). The system prints a banner message and status report. 3) Type ``help'' for further information. For full details, refer to the section entitled THE INTERAC- TIVE INTERFACE which appears below. Using the Electronic Mail Interface In order to use the email interface, send requests to: archie@archie.mcgill.ca Send the word ``help'' in a message to obtain a list of available commands and features. This is a completely automated interface, acting without human intervention. For full details, refer to the section entitled THE ELEC- TRONIC MAIL INTERFACE which appears below. Communicating with the Database Administrators This experimental database service is maintained by the Com- puter Science Department of McGill University. General com- ments and suggestions should be sent to: archie-l@archie.mcgill.ca Sun Release 4.1 Last change: 7 Oct 1991 1 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) Communications requesting additions to the set of hosts sur- veyed for the database, modifications to the Software Description Database, or pertaining to other administrative matters, should be sent to: archie-admin@archie.mcgill.ca THE INTERACTIVE INTERFACE Commands Arguments to commands shown in square brackets '[]' are optional; all others are mandatory. help List the valid _a_r_c_h_i_e commands. list [_p_a_t_t_e_r_n] List the sites currently stored in the database, and the time at which they were last updated. The optional regular expression argument can be used to limit the list to specific sites. Note that the numerical (IP) address associated with a site name is valid at the listed time, but may have been changed. Furthermore, the listed IP address is the primary address as listed in the DNS database (secondary addresses are not stored). Example: list lists all sites in the database, while list \.de$ lists all German sites. mail [_a_d_d_r_e_s_s_1,[_a_d_d_r_e_s_s_2...]] Mail the output of the last command to the specified address or comma-separated list of addresses (no spaces must appear in the address list). Example: mail user1@hello.edu,user2@goodbye.com In the absence of an argument, the mail is sent to the address specified by the mailto variable. Example: mail Sun Release 4.1 Last change: 7 Oct 1991 2 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) Conventional Internet addressing styles are understood. BITNET sites should use the convention: user@sitename.bitnet UUCP addresses can be specified as user@sitename.uucp prog _p_a_t_t_e_r_n Find all occurrences of programs with names matching _p_a_t_t_e_r_n. The interpretation of _p_a_t_t_e_r_n depends upon the value of the search variable. The output lists the names of hosts with matching entries, the size of the matching program, its last modification date, and its path. The results are sorted according to the value of the sortby variable, and are limited in number by the maxhits variable. set _v_a_r_i_a_b_l_e-_n_a_m_e Set the specified variable. See the section below con- cerning available variables, as well as the entries for unset and show. show [_v_a_r_i_a_b_l_e-_n_a_m_e] Display the value of a particular variable. If no variable is specified, display _a_l_l variables. Example: show maxhits site _s_i_t_e_n_a_m_e Produce a full table of contents for a specified _f_t_p(1) site in the _a_r_c_h_i_e database. The output format is similar to that of the UNIX command: ls -lR Example: site col.hp.com unset _v_a_r_i_a_b_l_e Remove any value associated with the specified vari- able. This may cause counter-intuitive behavior in some cases; for example, if maxhits is not defined by the user, prog will print the default number of matches rather than an unlimited number of matches. whatis _s_u_b_s_t_r_i_n_g Search the Software Description Database for the given Sun Release 4.1 Last change: 7 Oct 1991 3 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) substring, ignoring case. This database consists of names and short descriptions of many software packages, documents (like RFCs and educational material), and data files stored on the Internet. Example: whatis uucp in part gives as a result: findpath.sh UUCP Pathfinder logfile-stats UUCP LOGFILE analyzer mapstats UUCP map statistics pro- gram Variable Types The behavior of _a_r_c_h_i_e can be modified by certain variables, the values of which may be changed using the set command, or removed entirely by the unset command. There are three variable types: boolean (Set or unset) numeric (Integer within a defined range) string (String of characters, may or may not be res- tricted). Boolean Variables pager Filter all output through the pager _l_e_s_s(1L) (default: unset). When using the pager you may also want to set the term variable to your terminal type (see term vari- able). Example: set pager status During the database search, display a status-line con- taining the number of matches and percentage of the database searched (default: set). Numeric Variables autologout Set the length of idle time (in minutes) allowed before automatic logout (permissible range: 1-300; default: 60). Example: Sun Release 4.1 Last change: 7 Oct 1991 4 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) set autologout 45 logs the user out after 45 minutes of idle time. maxhits Allow the prog command to generate at most the speci- fied number of matches (permissible range: 0-1000; default: 1000). Set this to a smaller value if _a_r_c_h_i_e is too slow. Example: set maxhits 100 halts prog after 100 matches have been found. String Variables mailto If the _m_a_i_l(1) command is issued with no arguments, mail the output of the last command to the address specified by this string variable, which may contain a single mail address, or a comma-separated list of addresses (lists must not contain whitespace). Example: set mailto user@frobozz.com Example: set mailto user1@hello.edu,user2@goodbye.com Conventional Internet addressing styles are understood. BITNET sites should use the convention: user@sitename.bitnet UUCP addresses can be specified as user@sitename.uucp search Define the type of search to be performed by the prog command. The following values are permitted: exact Exact match (the fastest method). A match occurs if the file (or directory) name in the database corresponds _e_x_a_c_t_l_y to the user-given substring (including case). Sun Release 4.1 Last change: 7 Oct 1991 5 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) For example, this type of search could be used to locate all _x_l_o_c_k._t_a_r._Z files. regex Allow user-specified (search) strings to take the form of _e_d(1) regular expressions (the default search method). Note: unless specifically anchored to the begin- ning (with ^) or end (with $) of a line, _e_d(_1) regular expressions (effectively) have ``.*'' prepended and appended to them. For example, it is not necessary to type prog .*xnlock.* because prog xnlock suffices. In this instance, the regex match is equivalent a simple substring match. Those unfam- iliar with regular expressions should refer to the section entitled REGULAR EXPRESSIONS which appears below. sub Substring (case insensitive). A match occurs if the file (or directory) name in the database con- tains the user-given substring, without regard to case. Example: The pattern: is matches any of the following: islington this poison subcase Substring (case sensitive). As above, but taking case as significant. Example: The pattern: TeX Sun Release 4.1 Last change: 7 Oct 1991 6 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) will match: LaTeX but neither of the following: Latex TExTroff sortby Set the method of sorting to be applied to output from prog. Typing the keyboard interrupt character (gen- erally Ctl-C on UNIX hosts) aborts a search. Results obtained to that point will be sorted according to the sortby variable and sent as output. The output phase may be aborted by typing the abort character a second time. The five permitted methods (and their associated reverse orders) are: none Unsorted (default; no reverse order, though rnone is accepted) filename Sort files/directories by name, using lexical order (reverse order: rfilename) hostname Sort on the archive hostname, in lexical order (reverse order: rhostname) size Sort by size, largest files/directories first (reverse order: rsize) time Sort by modification time, with the most recent file/directory names first (reverse order: rtime) term Specify the type of terminal in use (and optionally, its size in rows and columns). This information is used by the pager. The usage is: set term [<#rows> [<#columns>]] The terminal type is mandatory, but the number of rows and columns is optional; specify either rows only, or both rows and columns (default: 24 rows, 80 columns). Examples: set term vt100 Sun Release 4.1 Last change: 7 Oct 1991 7 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) set term xterm 60 set term xterm 24 100 THE ELECTRONIC MAIL INTERFACE The _a_r_c_h_i_e email interface currently accepts a limited sub- set of the interactive interface commands, plus a few of its own. Variables are not supported in the email interface. The ``Subject:'' line in incoming mail is processed as if it were part of the main message body. The help command is exclusive; all other commands in the same message are ignored. A message not containing a valid request will be treated as a help request. The server recognizes the fol- lowing commands: compress Process the mail message with the _c_o_m_p_r_e_s_s(1) and _u_u_e_n_- _c_o_d_e(1) programs. Upon receiving the reply, the reci- pient should remove the mail header and run the rest of the file through _u_u_d_e_c_o_d_e(1), producing a file with a name of the form: file.Z Process this file with _u_n_c_o_m_p_r_e_s_s(1) to obtain the results of the request. help Send a message describing how to use the email inter- face. path _p_a_t_h Override the return address that would normally be extracted from the header. The path describes how to mail a message from _c_s._m_c_g_i_l_l._c_a, which is fully con- nected to the Internet, to your address. Consider adding a path command to a request to provide an expli- cit return address if the _a_r_c_h_i_e server does not respond to the original request within several hours. BITNET users should use the convention: user@site.bitnet UUCP users should use the convention: user@site.uucp prog <_r_e_g _e_x_p_1> [<_r_e_g _e_x_p_2> ...] Search of database for each (an _e_d(1)-style) regular expression, and return any matches. Multiple regular expressions may be placed on one line, in which case the results will be mailed back in one message. Where regular expressions appear on multiple lines, multiple messages will be returned, one for each line (not Sun Release 4.1 Last change: 7 Oct 1991 8 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) working correctly yet). Any regular expression con- taining spaces must be quoted with single or double quotes. Searches are case sensitive. The prog command is executed as if the search variable were set to _r_e_g_e_x. Those unfamiliar with regular expressions should refer to the section entitled REGULAR EXPRES- SIONS which appears below. quit Stop interpreting the request. This prevents the inad- vertent interpretation of text in an email signature which might accidentally resemble a valid _a_r_c_h_i_e com- mand. site <_s_i_t_e _n_a_m_e> | <_s_i_t_e _I_P _a_d_d_r_e_s_s> Return a list of the contents of the specified <_s_i_t_e _n_a_m_e>. The fully qualified domain name or IP address may be used. list<_r_e_g _e_x_p_1> [<_r_e_g _e_x_p_2> ...] List all of the sites names currently stored in the database that match <_r_e_g _e_x_p> (an _e_d(1)-style) regular expression, and return any matches. The format of the resulting list is: site name, site IP address and date last updated in the database. whatis <_s_u_b_s_t_r_i_n_g_1> [<_s_u_b_s_t_r_i_n_g_2> ...] Search the Software Description Database (SDD) for <_s_u_b_s_t_r_i_n_g> The SDD is a text database containing the names and short descriptions of about 3500 software packages, documents and datasets available on the Internet. If you have any corrections or additions, mail them to archie-admin@archie.mcgill.ca Multiple arguments may be placed on the same whatis command line. REGULAR EXPRESSIONS Regular expressions follow the conventions of the _e_d(1) com- mand, allowing sophisticated pattern matching. In the fol- lowing discussion, the string containing a regular expres- sion will be called the ``pattern'', and the string against which it is to be matched is called the ``reference string''. Regular expressions imbue certain characters with special meaning, providing a quoting mechanism to remove this special meaning when required. The rules governing regular expression are: c A character c matches itself unless it has been Sun Release 4.1 Last change: 7 Oct 1991 9 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) assigned a special meaning as listed below. A special character loses its special meaning when preceded by the character '\'. This does not apply to '{', which is non-special _u_n_t_i_l it is so treated. Thus, although '*' _n_o_r_m_a_l_l_y _h_a_s _s_p_e_c_i_a_l _m_e_a_n_i_n_g, _t_h_e _s_t_r_i_n_g '\*' matches itself. Example: The pattern acdef matches any of the following: s83acdeffff acdefsecs acdefsecs but neither of the following: accdef aacde1f Example: Normally the characters '*' and '$' are special, but the pattern a\*bse\$ acts as above. Any reference string containing: a*bse$ as a substring will be flagged as a match. . A period (known as a _w_i_l_d_c_a_r_d character) matches any character except the newline character. Example: The pattern .... will match any 4 characters in the reference string, except a newline character. ^ A caret (^) appearing at the beginning of a pattern requires that the reference string must start with the specified pattern (an escaped caret, or a caret Sun Release 4.1 Last change: 7 Oct 1991 10 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) appearing elsewhere in the pattern, is treated as a non-special character). Example: The pattern ^efghi The pattern will match only those reference strings starting with efghi; thus, it will match either of the following: efghi efghijlk but not: abcefghi $ A dollar sign ($) appearing at the end of a pattern requires that the pattern appear at the end of a refer- ence string (an escaped dollar sign, or a dollar sign appearing elsewhere, is treated as a regular charac- ter). Example: The pattern efghi$ Will match either of the following: efghi abcdefghi but not: efghijkl \< Match something at the beginning of a _w_o_r_d (the begin- ning of a line, or just before a letter, digit, or underline character, or just after a character which is not one of the foregoing). Example: The pattern \ Match the following one-character regular expression at the end of a word, as defined above. [string] Match any single character within the brackets. The caret (^) has a special meaning if it is the first character in the series: the pattern will match any character _o_t_h_e_r than one in the list. Example: The pattern [^abc] Will match any character _e_x_c_e_p_t one of: a b c To match a right bracket (]) in the list, put it first, as in: []ab01] A caret appearing anywhere but the in first position is treated as a regular character. The minus (-) character is special within square brack- ets. It is used to define a range of ASCII characters to be matched. For example, the pattern: [a-z] matches any lower case letter. The minus can be made non-special by placing it first or last within the square brackets. The characters '$', '*' and '.' are not special within square brackets. Example: The pattern [ab01] matches a single occurrence of a character from the Sun Release 4.1 Last change: 7 Oct 1991 12 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) set: a b 0 1 Example: The pattern [^ab01] will match any single character other than one from the set: a b 0 1 Example : The pattern [a0-9b] matches one of the characters: a b or a digit between 0 and 9, inclusive. Example : The pattern [^a0-9b.$] matches any single character which is not in the set: a b . $ or a digit between 0 and 9, inclusive. * Match zero or more occurrences of an immediately preceding regular expression. Sun Release 4.1 Last change: 7 Oct 1991 13 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) Example: The pattern a* matches zero or more occurrences of the character: a Example: The pattern [A-Z]* matches zero or more occurrences of the upper case alphabet. \{_m\} Match exactly _m occurrences of a preceding regular expression, where _m is a non-negative integer between 0 and 255 (inclusive). Example: The pattern ab\{3\} matches any substring in the reference string consist- ing of the character `a' followed by exactly three `b' characters. \{_m,\} Match at least _m occurrences of the preceding regular expression. Example: The pattern ab\{3,\} matches any substring in the reference string of the character `a' followed by at least three `b' charac- ters. \{_m,_n\} Match between _m and _n occurrences of the preceding reg- ular expression (where _n is a non-negative integer between 0 and 255, and _n>_m). Sun Release 4.1 Last change: 7 Oct 1991 14 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) Example: The pattern ab\{3,5\} matches any substring in the reference string consist- ing of the character `a' followed by at least three but at most five `b' characters. Tips for Using Regular Expressions 1) When matching a substring it is not necessary to use the wildcard character to match the part of the refer- ence string preceding and following the substring. Example: The pattern abcd will match any reference string containing this pat- tern. It is not necessary to use .*abcd.* as the pattern. 2) In order to constrain a pattern to the entire reference pattern, use the construction: ^pattern$ 3) The '[]' operator provides an easy mechanism to obtain case insensitivity. For example, to match the word: hello regardless of case, use the pattern: [Hh][Ee][Ll][Ll][Oo] THE ARCHIE DATABASE The _a_r_c_h_i_e database subsystem maintains a list of about 800 Internet _f_t_p(1) archive sites. Each night, the database subsystem executes an anonymous _f_t_p(1) to a subset of these sites and fetches a recursive directory listing (or a file containing the recursive directory listing if this exists). Currently, each site gets updated approximately once a month. The directory listings are stored on _a_r_c_h_i_e._m_c_g_i_l_l._c_a (132.206.2.3), where they are available to the Internet community via anonymous _f_t_p(1). They appear in Sun Release 4.1 Last change: 7 Oct 1991 15 ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) the directory ~_f_t_p/_a_r_c_h_i_e/_l_i_s_t_i_n_g_s in compressed form. BUGS AND LIMITATIONS 1) Only UNIX sites are included in the database. 2) The user can not limit searches to specific sites. 3) There is no graphical user interface. 4) There is no way to abort the help facility completely. LONG TERM PLANS The _a_r_c_h_i_e system is regarded as developmental, and is not presently being released to outside sites. The current database requires about 70 MB of disk storage, and the updates and searches put a noticeable load on the Sun 4/280 on which it operating. We hope to distribute _a_r_c_h_i_e to several other sites throughout the world, at a later date. We welcome comments and suggestions; please send them to _a_r_c_h_i_e-_l@_a_r_c_h_i_e._m_c_g_i_l_l._c_a. SEE ALSO bitftp (1L), ftp(1), telnet(1) AUTHORS Alan Emtage (bajan@cc.mcgill.ca) and Bill Heelan (wheelan@cs.mcgill.ca), McGill University. Manual page by R. P. C. Rodgers, UCSF School of Pharmacy, San Francisco, California 94143 (rodgers@maxwell.mmwb.ucsf.edu), Nelson H. F. Beebe (beebe@math.utah.edu), and Alan Emtage. Sun Release 4.1 Last change: 7 Oct 1991 16