------------------- ATA 89-9C MONOGRAPH ------------------- ------------------------------------------------------------------ CD-ROM Disc Interchangeability standards: Beyond iso 9660 with the Structured Full-Text Query Language (SFQL) Neil R. Shapiro, Elias Diamantopoulos, and Paul Cotton (A typeset version of this document can be obtained from Scilab Inc., 1921 Hexam Rd, Schenectady, NY 12309. (518) 393-1526; Fax at same number via autoswitch) ------------------------------------------------------------------ ABSTRACT Today, data on CD-ROM can only be accessed using the retrieval software that comes with the disc. This often means that end-users must work with several user interfaces. The Air Transport Association (ATA) has addressed this issue by developing a software interface standard which allows a single retrieval application to work with discs from multiple sources. A key component of this strategy is the Structured Full-Text Query Language (SFQL). This paper provides an overview of the SFQL language, and the ATA efforts to develop and test it. ------------------------------------------------------------------ LIST OF ACCOMPANYING FIGURES Figure 1. SFQL Conceptual Model. (OVRFIG1.PCX) Figure 2. Client/Server architecture for SFQL (OVRFIG2.PCX) Figure 3. Client/Server components and the need for standardization. (OVRFIG3.PCX) Figure 4. SFQL server design based on existing engine. (OVRFIG4.PCX) Figure 5. Terminology Comparison between SQL and SFQL (OVRFIG5.PCX) Figure 6. Illustration of SFQLCA with child substructures (VLR and FSR) (OVRFIG6.PCX) ------------------------------------------------------------------ INTRODUCTION ------------ The reliability of commercial aircraft is something we dare not take for granted--it requires top quality maintenance. For this reason, the commercial aerospace industry devotes significant resources to producing high quality, up-to-date maintenance documentation. The industry thus monitors new technology closely in order to improve the traditional publishing methods. It should come as no surprise then, that both the airlines and the aerospace manufacturers have been actively testing CD-ROM technology as a way to improve the efficiency of technical publishing. After three years of field trials, CD-ROM has proven to be a key technology in the aerospace industry. Foremost among the list of achievements is the improved accessibility of data provided by electronic access to documents on CD-ROM. Full-text retrieval, which provides complete ad hoc access, and hypertext technology, have begun to replace tedious manual processes for information inquiry on the maintenance hanger floor as well as in shop management operations of airlines. CD-ROM has also changed the economics of document production --making management of information more cost effective for both aerospace manufacturers and airlines. For example, periodic revisions to an engine shop manual may include as many as 2500 pages and can cost as much as $160 to ship. After receiving the revision, users may spend as many as 6 hours collating the revised pages into the manuals. With CD-ROM, a completely revised set of manuals can be sent to users at a lower cost than that of the paper revisions alone. But experience has also shown that the state-of-the-art of CD-ROMs are not advanced enough to support the aerospace industry's needs. Airlines have heterogeneous fleets of airframes and engines, and thus support documentation comes from many different suppliers. The suppliers, to meet their own production needs, select different vendors to produce the CD-ROMs. And the CD-ROMs that come from different vendors are incompatible. The incompatibilities are driven by software. Each disc comes with its own retrieval software, and only that software can be used with the disc. A CD-ROM produced with Retrieval System A cannot be accessed using Retrieval System B, or vice versa. In other words, CD-ROMs are Software Dependent. Thus, to the airline end-user, state-of-the-art CD-ROMs mean multiple user interfaces for working with multiple discs--even when the discs contain standardized information, such as manuals produced according to their industry documentation standard, ATA Specification 100. And multiple user interfaces, i.e., multiple software applications, opens the door for system incompatibilities over platforms, operating systems, windowing environments, and input/output devices such as monitors, printers, or specialized hardware. CD-ROM standards are the answer to this problem. Yet, despite landmark efforts which have standardized CD-ROMs to the file level (ISO 9660), there are no CD-ROM standards in place that will allow a single system to serve as the retrieval software for discs from different vendors. The technical reason for this incompatibility is the lack of file content standards for CD-ROM. But file content standards would not be sufficient to produce software independent CD-ROMs. There is more than data on the disc, there is metadata--data about data, in the form of indexes for full-text retrieval or columnar retrieval (relational) technology, and hypertext links, electronic cross references. Moreover, standards for metadata are not likely to appear--not only because they could not cover the breadth of requirements for CD-ROM retrieval systems, but because the proprietary nature of metadata is the basis of competition for system performance and features. (However, it should be noted that a program to provide a standard format for full-text databases is underway at the U.S. Air Force Electronic Systems Division, Hanscom Air Force Base, MA.) One solution to this is to build retrieval systems using a client/server architecture to separate the end-user interface from the underlying metadata access routines. The server software would be distributed with each disc and would understand the vendor-dependent format of the metadata. The server would accept commands via a standard interprocess request language and would return data in standardized formats to any other software program acting as its client. Thus, any vendor's end-user retrieval software (the client) could operate with any other vendor's disc--disc interchangeability. ATA STANDARDS FOR CD-ROM ------------------------ Given the importance of the CD-ROMs to publication of maintenance documentation, the Air Transport Association (ATA), the standards governor of the airlines, has taken the initiative to define standards to provide software independence of CD-ROMs (Shapiro and Bowers, 1989). The Structured Full-Text Query Language (SFQL), is a key component of this strategy. SFQL is designed to serve as the standard interprocess request language between the client and server. SFQL provides both a standard method of data request (e.g., a query), and a standard method of data return. As its name suggests, SFQL is based on the ANSI and ISO Structured Query Language (SQL), the industry standard query language for relational databases. The standard SQL would not be sufficient to serve as the client/server request language for the ATA CD-ROMs: SQL is designed primarily for relational database access; the ATA requires the features of a full-text (fully indexed) database on the CD-ROM. SFQL is thus an attempt to extend SQL to full-text databases. Starting with the existing standard, SQL, as the basis for a full-text retrieval language provides leverage in the design and use of the new standard. Foremost, it provides a potential bridge between full-text and relational databases. Although initially the ATA is looking to full-text databases only, it will be possible in the future to provide a combined full-text/relational server which could provide the client with transparent use of the two database types. In addition, the common language reduces the learning curve for those familiar with SQL, and thus promotes acceptance of the standard. In order to base the new language on SQL, however, an analogy is needed between relational and full-text databases so that common areas can be mapped to SQL. This is not as difficult as it seems: full-text systems have evolved to handle structured data, just as relational systems have evolved to deal with free text. Consider a full-text database, which is comprised of a set of documents. When we perform a search, the results are also a set of documents. Likewise, a table in a relational database contains a set of rows. When we perform a query against a table, the result is a set of rows. The key difference between a document in a full text database and a row in a relational database, then, is that the relational database row is rigidly structured: the row is divided up into fields (called columns). Documents in full-text databases are not really unstructured, of course. There are a number of logical elements in the data, which may be "tagged" using a mark-up language. Most full-text databases permit searches to be restricted in domain to one or more of these tags--much in the same way a relational database permits searches on specific columns. When the tagged fields of the document represent unique, nonoverlapping attributes, it is easy to see the analogy between the full-text and the relational models. The analogy, illustrated in Figure 1, is used to map SQL to full-text databases. This analogy is, of course, based on a class of full-text data where the structure can be easily compared to the table-based relational model, since tag-fields are nonhierarchical. While, in fact, most full-text systems support only nonhierarchical tags, many documents are better described by a hierarchical model (e.g., ATA Maintenance/Engine Shop Manuals). We shall see later how hierarchical relationships can be represented in an SFQL database. ARCHITECTURE ------------ SFQL is a key part of the ATA client/server architecture for CD-ROM database interchangeability, as shown in Figure 2. SFQL is the language used by the end-user software (the client) to query and request data from one or more retrieval engines (servers). SFQL also specifies a standard data return structure, the SFQLCA, which is a wrapper to return variable length data in multiple formats, so that a query may return both text and graphics. With this approach, the same end-user software can be used with discs from different vendors. The user simply inserts a new disc, and behind the scenes, the end-user software connects with the retrieval engine supplied on the new disc. Whichever disc is loaded, when the user enters a search request (for example, by filling in a form), the request is translated into a series of SFQL statements, which are sent to the retrieval engine. The retrieval engine, in turn, packages all responses in the standard SFQLCA return structure. Note that SFQL itself is not sufficient to provide disc interchangeability. A number of other standards must come into play to orchestrate the interaction between client and server (see Figure 3). For example, an end-to-end communications protocol is required to ensure error free transfer of SFQL requests and data return. Since a number of applicable standards exist, depending on platform, the ATA plans to adopt the prevalent standard for each specific platform. Data format standards are also required: SFQL only defines the envelope in which the server returns data--the contents must still be interpretable by the client. Thus, the ATA standards cover Layers 1 to 3, which will provide disc interchangeability. But only Layer 2, SFQL Language and Return Data Structures, represents a significant new development that might provide benefits to the entire CD-ROM industry. PROOF OF CONCEPT ---------------- To test whether the ATA standards could provide software independence for CD-ROMs, a proof of concept experiment was undertaken in December of 1989. GE (a manufacturer of jet engines) and Aerospatiale (an airframe manufacturer) each set out to develop a complete full-text retrieval system in compliance with the draft standard. The plan was that each company would independently develop prototype systems consisting of a client and a server which communicate using SFQL. After the prototypes were developed, an interchange test would be performed. If software independence was achieved, Aerospatiale's client should work with GE's server as well as the Aerospatiale server; Likewise, GE's client should work with both companies' servers. Rather than building complete full-text systems from scratch, GE and Aerospatiale chose to build servers using commercial full-text systems. The idea was to encapsulate the existing system into a server using an interprocess interface based on SFQL. The server would translate incoming SFQL requests to the proprietary system's interface, and conversely, would format outgoing data into SFQL defined return structures. Figure 4 illustrates this design. GE built its server using KnowledgeSet's (Monterey, CA) KRS retrieval engine while Aerospatiale used Fulcrum Technologies' (Ottawa, Canada) Ful/Text engine. Both servers were developed for Microsoft Windows (MS-Windows) using Microsoft C and the MS-Windows Software Development Kit, as well as libraries from Fulcrum and KnowledgeSet. The servers proved to be good test cases for the experiment, since they differed significantly in their design assumptions. GE and Aerospatiale each developed their own client applications using C and the window-based graphical user interface provided by MS-Windows. GE developed two different types of client: (1) an Interactive SFQL (ISFQL) front-end which allowed dynamic testing of SFQL queries simply by highlighting one or more SFQL statements in the editor window; and (2) a general purpose query-by-forms application. The latter application was designed to demonstrate a form-based application that was not tied to a particular database design: it did not have field names bound to the form. Field names were supplied at run-time to the client dialog by the server, and selected by the user via point and click techniques. Aerospatiale also developed a query-by-forms application, although with a markedly different user interface. It was designed under a more rigid set of assumptions in order to provide the type of straightforward, easy-to-use interface that would be placed in an airline maintenance bay. In addition to the existing SFQL, the prototype development required writing an end-to-end communication protocol, not included as part of SFQL, to provide the underlying interprocess communications. The specification that resulted, based on Microsoft's DDE, eventually became the ATA proposed standard for the PC's DOS/Windows environment (ATA 89-9C.WINCOMM-R2-1990). The prototypes were successfully demonstrated at the February 1990 AIA/ATA meeting in Washington, D.C. Each system was first shown performing queries and retrieval from its "native" server. This demonstrated that the systems were similar to the CD-ROM applications familiar to the ATA: there was nothing new there--everyone had used proprietary retrieval systems. The magic came with each application's derivative of a "New Disc" menu item. What made this option different, of course, was that either GE's or Aerospatiale's disc could be inserted and used with the application, without exiting, rebooting, reconfiguring, or restarting. The menu item, transparent to the user, disconnected from the current server, allowed the user to change discs, and then established a new connection with the server on the new disc. To the user, this operation demonstrated disc interchangeability. PROTOTYPE RESULTS ----------------- The prototype effort was the first known demonstration of database interchangeability among full-text engines. The effort validated the client/server model for disc interchangeability, and the selection of SQL as the basis for the client/server language. The prototypes also uncovered some problems with the original SFQL specification. The most major problem was the initial assumption that clients using locator information returned by SFQL should, be allowed to access the disc directly for data access. The design was supposed to optimize performance of applications by not requiring large data transfers to be made through the client/server interface. However, the design was in direct conflict with many existing systems which store the data internally in proprietary formats according to their own optimization goals (e.g., space or performance). More importantly, the approach violated one of the initial design premises: standardize only the interface between layers and give the implementors as much flexibility as possible within either layer (server or client). SFQL2 DEVELOPMENT ----------------- The ATA has addressed this and other language issues by forming a joint ATA/Vendor SFQL working group. Vendors were invited to participate in a review of the proposed standard in January, 1989, and in October, 1990, at ATA headquarters in Washington, DC. At the October meeting, the SFQL working group of volunteers was formed to draft a second version of the language (89-9C.SFQL2-R1-1990). Vendors represented on the committee included Context Corporation, Electronic Data Systems (EDS), Fulcrum Technologies, IBM, KnowledgeSet, Maxwell Data Management, and TMS. The ATA was represented by American Airlines, British Airways, and four aerospace industry representatives (airline suppliers): Aerospatiale, Boeing, Douglas, and GE. The group completed the draft specification in meetings spanning over 12 days, using a JAD-based methodology Joint Application Design is a method invented by IBM in which users and developers work as a team to develop a set of system specifications. Although not traditionally applied to this type of effort, JAD turned out to be an effective means for integrating the knowledge of this diverse group. Foremost among the changes, the SFQL model is now entirely black box. Both search and data access now work through the SFQL-based interprocess communications. This makes it easier for vendors to comply with the standard, leaves room for innovations which improve performance or functionality, and can provide greater upward compatibility as the underlying search engine technology evolves. The SFQL2 specification, along with several other related draft ATA retrieval standards, are now undergoing further validation testing. The validation testing will assert that SFQL, combined with a number of other key standards (e.g., the logical partitioning of documents using SGML based mark-up), provide an effective, software independent solution for technical documentation delivery on CD-ROM. In the next sections, we will provide a look at the SFQL language itself, including a discussion of SQL extensions to support hierarchical tags. THE SFQL LANGUAGE ----------------- The draft SFQL standard (ATA 89-9C.SFQL2-R1-1990) was written by a group of volunteers representing the Aerospace Industry Association, the Air Transport Association, and interested vendors. The specification was based on ANSI SQL (ANSI X3.135-1986). Since the primary focus of the ATA standards is CD-ROM based retrieval, not all of the features of the ANSI specification were included. Specifically, features relating to dynamic database updates, dynamic view definition, transaction processing, and database definition were deferred as low priority items in a data publishing paradigm. Other features such as SQL subselects and subqueries were deferred because the cost/benefit ratio of complexity to usefulness was too high. Finally, some features such as joins were excluded because we did not yet fully understand all of the implications of the operation in the full-text environment. Thus, the nucleus of SFQL is a subset of the 1986 ANSI SQL specification. The SFQL Working Group then extended this base language to include features for full-text query and data manipulation which are not part of SQL. All new language constructs were added under the restriction that they were used in a manner consistent with SQL. Further, as extensions were designed, the working group referred to newer drafts of the ANSI and ISO SQL standards, to avoid reinventing features which had been specified by ANSI or ISO but were not yet approved. When a new feature was needed, the potential for conflict with new SQL features was evaluated and factored into the design. DEFINITIONS ----------- The SFQL specification introduces some new nomenclature to help users map document-based concepts into the table-based metaphor used in SQL. The mapping between the two sets of terminology is given in Figure 5. An SFQL database consists of one or more collections, analogous to SQL tables. Each collection is a set of document-records (SQL rows), where a document-record may contain all or part of the information we normally associate with a paper document. Document-records are divided into fields; since these fields, analogous to the columns of an SQL table, are typically derived from a marked ("tagged") area of a document, they are formally referred to as tag-fields. Tag-fields may contain the entire source document, a specific portion of it such as the title or abstract, or may contain a single item (such as a part number). Tag-fields are created by marking up documents with a standard markup language, such as SGML, or via vendor-specific methods LANGUAGE SYNTAX --------------- SFQL is very simple. It consists of three general statement types: SELECT, CURSOR, and SET. The SELECT statement is used to formulate all queries. CURSOR is a family of four statements (DECLARE, OPEN, FETCH, and CLOSE) which request return of data one record (or portion of a record) at a time. SET is used to configure options in the SFQL server. These constructs are familiar to anyone who has used an SQL-based relational database, since they are used in the same way as in SQL. (SET is found in most SQL databases; however, it is not part of ANSI X3.135-1986.) THE SELECT STATEMENT -------------------- The heart of the SFQL data manipulation language is the SELECT statement, used to express queries. It consists of three basic clauses: SELECT return_list FROM collection [UNION collection] WHERE predicate ; The SELECT clause is used to specify a list of tag-fields to be returned from matching document-records in the collection(s) indicated in the FROM clause. The WHERE clause is used to specify a predicate which indicates the criteria for including document-records in the result set. The result of the query will be a new, temporary collection, called the search-results, containing all the document-records that matched the criteria in predicate, but where each document-record consists only of the data in the tag-fields specified in the return_list. The structure and purpose of the SFQL SELECT statement presented above is basically the same as that of SQL. One important difference, however, is in the FROM clause. The SQL FROM clause allows a comma-separated list of tables (collections). This is the basis of the SQL join operation, which as mentioned earlier, is not currently included in SFQL. However, multiple collections may be searched by means of the UNION operator in the FROM clause, which was adopted from the draft specification of SQL3. Unlike a join, the UNION operation performs a simple set-based union of the collections prior to the search. PREDICATE EXTENSIONS -------------------- The most significant extensions to the SQL SELECT involve the addition of predicates to permit full-text searches such as domain-restricted search and proximity search. Domain-restricted search is used to find a word or phrase in a particular tag-field of each document-record. This improves the precision of a search, compared to a word or phrase search conducted across an entire document. For example, one could search for all document-records containing the name "Smith" in the authors tag-field. The SFQL query for this would look like: SELECT authors FROM Publications WHERE authors CONTAINS 'Smith' ; Proximity search is used to locate a phrase or sentence where the exact wording is not known. It enables the user to search for a word within a specified number of units of another word, where units consist of characters, words, sentences, or paragraphs. For example; SELECT DOCUMENT FROM Publications WHERE DOCUMENT CONTAINS 'turbine' WITHIN 3 WORDS OF 'power generation' ; This query would return the entire text of any document-record in the collection Publications where the word "turbine" is three words or fewer before or after the phrase "power generation". The keyword DOCUMENT is a special reserved name which represents the entire document-record as a single tag-field. Of course, SFQL also includes the standard SQL relational predicates such as "=", ">", and "<", which can be combined with proximity or domain-restricted search using traditional boolean operations (AND, OR, NOT). For example, to find the titles of all articles published in Byte after 1989 concerning "SGML": SELECT titles FROM Articles WHERE date > DATE(1989-12-31) AND source = 'Byte' AND DOCUMENT CONTAINS 'SGML' ; Note that in this query, the source tag-field in the document-record would have to contain the text "Byte", exactly as specified, since relational operators are used in SFQL exactly as they are in SQL. This means that the comparisons are made without the benefit of the text processing used in proximity and domain searching. CURSOR OPERATIONS ----------------- An SFQL SELECT statement is typically embedded in a cursor to control the flow of return data. A SELECT which is not embedded in a cursor can only return a single document-record. This is sometimes useful, however, to perform return results of functions such as HITS() or SYSTEMINFO(). A cursor is simply a pointer used to direct access to rows in the collection that results from a search. The cursor is defined via a DECLARE statement, which associates it with a SELECT statement (a query). The basic syntax is: DECLARE cursor_name CURSOR FOR select_statement; where cursor_name can be subsequently used to access the results of the select_statement. Note that this operation only stores the definition of the cursor; the actual search is not performed until the cursor is opened as: OPEN cursor_name; Once the search is conducted, the data in the search-results need to be retrieved. The FETCH statement is used for this. The basic syntax is: FETCH cursor_name ; The fetch brings back the data in the tag-fields specified in the SELECT statement for the current document-record. The cursor starts at the top of the search-results, pointing to the first document-record, and then moves down document-record by document-record with each successive fetch. Thus, the first time a fetch is issued the first document-record is retrieved; the second fetch retrieves the second document-record, etc. When all the document-records have been retrieved, a FETCH statement returns an exception condition. The SFQL cursor operations just described are identical to that of SQL. A few language extensions were made, however, to accommodate differences in the quantity of data expected in the two types of databases. In SQL-based databases, a row is typically short in length (a maximum record length is usually enforced by the database system). In contrast, for full-text systems, document-records are typically very large. Thus, the user needs to control which fields within a document-record to return, and must be able to return portions of the data within the fields. In SQL, control over this is provided entirely within the SELECT clause. In other words, fetches based on the cursor: DECLARE cursor1 CURSOR FOR SELECT author, title FROM abstracts; return only the author and title of each row (which could contain other fields such as journal, date, and abstract). But, in SQL, after opening the cursor, the application cannot change the list of return fields without redefining the cursor and reexecuting the query. Thus, in SQL, a declared cursor would typically list all fields of interest in the SELECT, and the application would discard any fields that were not needed on a particular fetch. In SFQL databases, where document-records are expected to be very long, this restriction would place an unnecessary burden on client/server communications. Therefore, in SFQL, the cursor FETCH operation, used to retrieve data document-record by document-record, was extended to allow the return_list to be changed on a document-record by document-record basis. The mechanism for this enhancement was to add a FIELDS clause to the FETCH statement. The FIELDS clause is specified in the FETCH statement as: FETCH cursor_name FIELDS return_list ; The FIELDS clause allows the FETCH to return a subset of the items in the SELECT clause of the cursor declaration. Because the FIELDS clause may be included with each FETCH statement, the list of items returned may be changed for every document-record in the search-results. For example, FETCH cursor1 FIELDS author; for the above query would only return the tag-field author for the current document-record. To illustrate how this feature fits into the development of an application, consider the following scenario. When the user issues a query, the system conducts the search and returns summary data identifying which documents matched the search criteria. For example, this summary information might consist of the document title and author. The user then selects the documents of interest directly from this summary information. As the user makes further choices from the summary, the client requests more information about the matching document using the FIELDS clause of the FETCH, without the need to reissue the query and without affecting the original search-results, which comprise the summary list. THE "HORIZONTAL" CURSOR ----------------------- Using the FIELDS clause to return partial document records by tag-fields helps in limiting the amount of information returned as a result of a query. However, in many full-text databases a tag-field itself might be very large--for example, in a collection of famous novels, the tag-field DOCUMENT could return the entire text of War And Peace. Further, unlike most relational database (SQL) queries, the criteria which qualified this record for the search-results is not the entire tag-field, but a relatively small portion of it. To return data from within a tag-field relative to the areas that matched the predicate, the SFQL function HIT_TEXT() is used in the FIELDS clause of the fetch. The general form of HIT_TEXT is: HIT_TEXT (tag_field_spec,fetch_orientation, maxlength, hit_justification, tag_list) where fetch_orientation indicates the direction of movement of the horizontal cursor, either PREVIOUS, NEXT, ABSOLUTE n, or RELATIVE n--the same terms used in the standard FETCH "vertical" cursor. The maxlength argument is used to specify the amount of text to return. To control the position of the matched text in the return buffer, hit_justification is used to specify the return of text either before, after, or centered about the matched text. In any case, the actual matched text will be marked in the document using special, configurable character codes called match-codes. The last argument, tag_list, is a blank-separated list of tags to remove (only the tag itself is removed, not the tagged text) before returning the text. The keyword ALL can be used in the tag_list to filter all tags out of the return text. For example, to return 1000 characters of text centered around the first hit in the current document-record, with all tags removed, the following FETCH would be sent to the server: FETCH cursor1 FIELDS HIT_TEXT(DOCUMENT, NEXT, 1000, CENTERED, 'ALL') ; To retrieve the same amount of text around the next hit, this FETCH statement would be reissued. After examining the text surrounding a hit and deciding the document is relevant, users often want to scroll through the adjacent information. The DATA function provides this capability in SFQL. The format of the DATA function is: DATA (tag_field_spec, fetch_orientation, maxlength, tag_list) which are interpreted as they are for HIT_TEXT above. For example, to return the next block of 1000 characters of text following our HIT_TEXT example above: FETCH cursor1 FIELDS DATA(DOCUMENT, NEXT,1000,'ALL') ; Likewise, to return the last 1000 characters in the DOCUMENT: FETCH cursor1 FIELDS DATA(DOCUMENT, LAST,1000,'ALL') ; The DATA function can also work independently of the HIT_TEXT function. This function can be used to request blocks of data from a tag-field on user demand. This is useful for tag-fields containing lengthy text or large amounts of graphics data. Thus: FETCH cursor1 FIELDS DATA(illustration, FIRST, 1000, 'ALL') ; returns the first 500 characters of the tag-field abstract, and: FETCH cursor1 FIELDS DATA(illustration, ABSOLUTE 3, 500, 'ALL') ; returns characters the characters in positions 1001 - 1500 of the tag-field. THE SET STATEMENT ----------------- The SET statement is included in SFQL to give the user control over server operations. In full-text systems, a query can result in a very large number of document-records, which can tie up system resources needlessly both in terms of memory and processing time. For example, to eliminate the possibility of an excessive number of document-records returned by a query, the following statement can be issued: SET MAX_SEARCH_RECORDS(100); which limits the number of document records to 100. Similarly, the functions SUSPEND and RESUME are used by the client to interrupt (or resume) an operation at the server that is taking too much time. To suspend a particular search, the client could issue the SFQL statement: SET SUSPEND(cursorname) ; where cursorname represents the name of the cursor-based search that is ongoing. There are a number of other SET functions in SFQL, which control important aspects of the server other than resource allocation. For example, the MATCH_METHOD function determines whether or not case is significant during queries, or whether fuzzy matching (e.g., verb stemming) is in effect. SFQL RETURN DATA STRUCTURES --------------------------- The SFQL grammar provides a means to express queries and request data from the server. A standard data return structure is also required, so that the client can interpret the data returned by the server. SFQL specifies return of data and status information in a structure called the SFQL Communications Area (SFQLCA). The SFQLCA has two parts, a header that contains status information, and a message area (SFQLMSG) which is used to return data or extended error messages. The SFQLCA header is modeled after several popular SQL implementations. (Although the concept of a communications areas is part of the SQL standard, the precise definition of an SQLCA is not.) Among the more important items of the SFQLCA is the SFQLCODE, which provides an integer return code indicating the status of the operation. (SQLCODE is the only part of SQL's SQLCA which is standard.) The server responds to every SFQL statement sent by the client with an SFQLCA. For example, the return SFQLCA may simply contain a "No Errors" status code following a "CLOSE cursor" statement, or it may contain an error message ranging from statement parse errors to resource limitation errors. When data is returned in the SFQLMSG area, for example, in response to a FETCH, it is formatted in a special multi-field variable length return structure called a Variable Length Record (VLR). The VLR permits the return of data from tag-fields of various types and sizes, from integers to large text or binary objects. Each data object is placed in a special container structure called a Field Subrecord (FSR). The FSR header identifies the source of the data object, and its type. A variable number of FSRs may be placed in a VLR. Figure 6 illustrates this in the context of the entire data structure hierarchy--from SFQLCA to FSR. As the figure shows, there is one FSR returned for each of the items in the return_list of the FIELDS clause (or in the return_list of the SELECT if there is no FIELDS clause in the FETCH.) For example the statement: FETCH cursor1 FIELDS title, author, abstract ; would return an SFQLCA message with status information and a VLR containing three FSRs to hold the data associated with the title, author, and abstract return items. CONCLUSIONS ----------- The architecture described here is an evolutionary step for CD-ROM technology. It permits users to grow beyond a single major CD-ROM database application towards a complete CD-ROM information system. The SFQL language also represents an evolutionary step for both full-text systems and relational databases. SFQL provides full-text systems the features of SQL, a powerful and well-established query language standard for working with structured data. SFQL provides relational database systems with the necessary language extensions to allow databases to exploit full-text capabilities. But most importantly, SFQL is the first step in forming permanent bonds between these important technologies. Although SFQL is not a complete superset of SQL, the groundwork has been laid to bridge the two technologies. The underlying model--relational, full-text, or both--should be transparent to the client application during retrieval. Then we will have truly achieved disc interchangeability. BIBLIOGRAPHY ------------ ANSI X3.135-1986. American National Standard for Information Systems--Database Language--SQL. American National Standards Institute, New York, NY. ATA, 1990. ATA Specification 100, Appendix 1: Digital Data Standards, Published by the Air Transport Association, 1709 New York Avenue Northwest, Washington D.C. 20006. ATA 89-9C.COMMWIN-R2-1990. CD-ROM Interchangeability Standard--Client/Server Communication Protocol: Microsoft Windows. Air Transport Association, Washington, DC. ATA 89-9C.SFQL2-R1-1990. CD-ROM Interchangeability Standard--SFQL. Air Transport Association, Washington, DC. Shapiro, N.R. and Bowers, F.J. (1989). Standardizing the delivery of technical documentation on CD-ROM's. CD Data Report, pp. 21-28. ABOUT THE AUTHORS ----------------- Neil R. Shapiro, Ph.D., is an independent consultant specializing in cooperative applications, user interface technology, and optical publishing. As the original author of SFQL, he has represented GE Aircraft Engines in the ATA standards effort for the last two years. He is a member of the IEEE Computer Society and the ACM. Dr. Shapiro may be reached at Scilab Inc., 1921 Hexam Rd., Schenectady, NY, 12309. TEL 518/393-1526, FAX: 518/393-1526 (automatic routing on carrier detect, or press Extension 11). Elias Diamantopoulos, M.S., has been an engineer at GE's Ordnance Division since 1986, where he develops expert system based applications with hypermedia-based support. He is currently working to define a paperless maintenance system for the Navy. Mr. Diamantopoulos implemented GE's SFQL server prototype as part of a Master's Thesis for the University of Massachusetts. He is a member of IEEE. Paul Cotton is Technical Product Manager at Fulcrum Technologies, Inc, a supplier of full-text search and retrieval software. Mr. Cotton was one of the key contributors in the SFQL Working Group. He was a co-founder of Officesmiths, Inc. and principal designer of The Officesmith, an integrated text database and office automation package. Mr. Cotton has M.Math and B.Math degrees in Computer Science from the University of Waterloo. He can be reached at 560 Rochester Street, Ottawa Ontario Canada K1S 5K2 TEL: 613/238-1761 or EMAIL: uunet!fultech!pbc. ----------------------------------------------------------------- Copyright 1991, Scilab Inc.. Permission to copy/download/upload, without fee, all or part of the material is granted provided that (1) This copyright notice is included in its entirety; (2) the copies are not made or distributed for direct commercial advantage; and (3) Scilab Inc. (1921 Hexam Rd, Schenectady, NY 12309-6510) is notified of the distribution details (e.g., the event at which it was distributed or the recipient names).