Newsgroups: mod.sources Subject: TRC - expert system building tool (part 3 of 8) Approved: jpn@panda.UUCP Mod.sources: Volume 3, Issue 111 Submitted by: ihnp4!dicomed!ndsuvax!nckary (Daniel D. Kary) This is NOT a shell archive. Simply delete everything up to and including the cut mark and save the result as reference.1.doc. Dan Kary ihnp4!dicomed!ndsuvax!nckary -------------- cut here --------------- The TRC Reference Manual Daniel D. Kary North Dakota State University Computer Science Department 300 Minard Hall Fargo, ND 58102 _A_B_S_T_R_A_C_T The syntax of TRC is formally defined. The output of TRC is elucidated. TABLE OF CONTENTS PART ONE - INPUT 1. INTRODUCTION 2. OVERVIEW 3. LEXICAL ELEMENTS 4. DEFINITIONS 5. SHORT TERM MEMORY 6. LONG TERM MEMORY 7. OPTIMIZER PART TWO - OUTPUT 8. OVERVIEW 9. COMMON PROCEDURES 10. DATA OBJECTS 11. MANIPULATING THE DATA 12. TRANSLATING RULES 13. OPTIONS APPENDICES A. TRC GRAMMAR B. ERROR MESSAGES C. STYLE NOTES D. SAMPLE PROGRAM _1. _I_N_T_R_O_D_U_C_T_I_O_N TRC is a programming language that is useful for build- ing expert systems. It is presumed that the reader is fami- liar with expert systems in general and has used at least one expert system building tool. Some terms that are widely - 2 - used in describing expert systems have specific meanings when used to describe TRC and will be defined now. The set of situation-action rules that embody the knowledge an expert uses to solve a problem are referred to as Long Term Memory (LTM). The information that may vary with each instance of the problem is referred to as Short Term Memory (STM). The code which determines if the situa- tion part of a rule is true will be called a pattern matcher or a matcher. The code which determines which rule to activate will be called an inference engine and includes both the matcher and the LTM. The input to the TRC compiler is called a specification. The input to the TRC compiler is a rule based language. The output is a set of C language files. The procedures in the C language files output by the TRC compiler collectively implement the inference engine. An inference engine is to an expert system as a parser is to a compiler: it is of cen- tral importance but it does not comprise a complete imple- mentation. TRC does not provide code for interaction with the user, but does permit the programmer to easily add this code. This document is divided into two parts and a set of appendices. The first part presents a formal definition of the input language with examples of each language feature. The second part describes the output of the TRC compiler and includes some important insight on integrating TRC generated code with other C language code. The appendices include the complete TRC grammar, a listing and explanation of all the error messages that TRC might produce and a sample specifi- cation. PART ONE - INPUT _2. _O_V_E_R_V_I_E_W Every specification file consists of five sections, the header, definitions, short term memory (data base), long term memory (rules) and the trailer. These sections are separated by double percent characters. The form of a full specification is illustrated below. The header, STM and trailer are optional so the minimum specification would con- tain only the definitions and LTM. All of the "%%" marks must be present in each specification file. header %% definitions %% - 3 - STM %% LTM %% trailer The purpose of the header and trailer sections is to permit the inclusion of C language code in the specifica- tion. The header and trailer are each composed of a single lexical element called a c-code which is defined in section 3. Separate sections are devoted to each of the remaining parts of a TRC specification. _3. _L_E_X_I_C_A_L _E_L_E_M_E_N_T_S A program consists of a single file. A file is a sequence of lexical elements composed of characters. Char- acters may be one of these classes; (1) upper-case-letters, (2) lower-case-letters, (3) digits, (4) special characters (5) separators (6) embedded characters and (7) other charac- ters. (1) upper-case-letters A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (2) lower-case-letters a b c d e f g h i j k l m n o p q r s t u v w x y z (3) digits 0 1 2 3 4 5 6 7 8 9 (4) special characters " ( ) : ; = < > ! ^ _ $ . { } / * % (5) separators space tab newline (6) embedded characters \n embedded-newline \t embedded-tab \b embedded-backspace \r embedded-carriage-return \f embedded-form-feed \\ embedded-back-slash \" embedded-quote (7) other characters @ # & + [ ] ` ~ ' | , ? The following names are used when referring to special characters: Symbol Name Symbol Name - 4 - " quote : colon ; semicolon = equal ! exclamation ^ hat _ underscore $ dollar < less-than > greater-than ( left-paren ) right-paren { left-bracket } right-bracket * asterisk % percent - minus A lexical element is a either a delimiter, an identif- ier, an integer literal, a floating point literal, a string literal, a comment or a c_code. In some cases a separator is required between lexical elements, specifically when adjacent lexical elements could be interpreted as a single lexical element. A separator is any of space, tab or new- line. One or more separators are permitted between any two lexical element, before the first lexical element and after the last lexical element. _3._1. _D_E_L_I_M_I_T_E_R_S A delimiter may be one of the following special charac- ters: : ; " . ^ - ( ) { } < > A delimiter may be one of the following compound delim- iters. Each compound delimiter is composed of two adjacent special characters. => %% != == >= <= The following names are used when referring to compound delimiters: Delimiter Name => arrow %% delim != not-equal == equality >= greater-than-or-equal >= less-than-or-equal _3._2. _I_D_E_N_T_I_F_I_E_R_S Identifiers are used as tokens and as reserved words. Separators are not allowed in an identifier. The underscore character is the only special character that may be part of an identifier. identifier ::= letter { underscore | letter | digit} - 5 - letter ::= upper-case-letter | lower-case-letter Examples: PENNY Get_Stuff x1 ThisOne WrZ_123 etc_ The identifiers that are reserved words are: ADD MARK PROFILE BACKTRACK NORECURS RECURS DUMP NOT SAVE EMPTY OPTIMIZE STRING FLOAT POINTER TRACE INT PREFIX ZERO _3._3. _N_U_M_E_R_I_C _L_I_T_E_R_A_L_S There are two classes of numeric literals, integer literals and floating point literals. The presence of a decimal point distinguishes floating point literals from integer literals. floating-point-literal ::= [ minus ] digits dot digits integer-literal ::= [ minus ] digits digits ::= digit { digit } Example integer literals: 1 -33 187 Example floating point literals: 0.5 -3.14159 6.0 _3._4. _S_T_R_I_N_G _L_I_T_E_R_A_L_S A string literal is formed by a sequence of characters (possibly zero) enclosed between quote characters. Any of the six classes of characters may be embedded in a string. string-literal ::= quote { [ character ] } quote Examples: "" "\"recursion\"" "these characters can be in a string $, ! => etc_\n" _3._5. _C_O_M_M_E_N_T_S Comments may be included anywhere in the input file that separators or delimiters may occur. Comments follow the style of comments in the C language. Comments may not be nested within comments. Any of the six classes of - 6 - characters may be embedded in a comment. comment ::= slash asterisk { [ character ] } asterisk slash Examples: /* a simple comment */ /* a multi-line comment */ /******************* * A Fancy Comment * *******************/ _3._6. _C__C_O_D_E A c_code is a fragment of C language code that is embedded in the input file. A c_code is recognized by the scanner as a single lexical item. The C language itself is not parsed by TRC. A c-code may not contain a procedure or function. c_code ::= left-bracket { [character] | [c_code] } right-bracket Example: { if(condition){ action(argument); another_action(); } } _4. _D_E_F_I_N_I_T_I_O_N_S Each entity that can be referred to by a TRC rule must be defined in the definition section. Each entity that is defined is called an _o_b_j_e_c_t. Objects may have numeric or string values associated with them. These associated values are called _e_l_e_m_e_n_t_s of the object. There are two forms of definitions. There is a simple form for objects which have no elements and an extended form for objects that have asso- ciated elements. definition ::= identifier definition ::= identifier left-paren item-list right-paren item-list ::= { [ item ] } item ::= identifier colon type type ::= INT | FLOAT | STRING | POINTER For each object there will be an associated object count in the output code, which represents the number of - 7 - objects of that type that exist at any point in time. For each object with at least one element, there will be an associated structure and list of objects of that type in the output code. The set of object counts and object lists defined in the definition section represent the STM that the system of rules may refer to. Each element that is defined for a given object must have a data type specified. Strong type checking is enforced throughout TRC. Comparisons and assignments of elements must involve elements or literals of the same type. There is no coercion of types. The INT, FLOAT, STRING and POINTER types are described in section 10. The POINTER type is a pointer to a structure of the type of the object that contains it. Examples: A b_b (This : INT) CAST ( THAT : INT The_Other : FLOAT) _5. _S_H_O_R_T _T_E_R_M _M_E_M_O_R_Y The short term memory (stm) section of the input is where the initial state of the working memory is specified. The intention of this section is to permit the working memory to be initialized to some state that may be required for each invocation of the expert system. It is also intended to serve as a way of entering test data while the expert system is being developed, before data entry pro- cedures are developed. stm ::= { [ entry ] } entry ::= [ integer-literal ] identifier entry ::= [ integer-literal ] identifier left-paren { [ init-item ] } right-paren init-item ::= identifier arrow value value ::= integer-literal value ::= floating-point-literal value ::= string-literal The short term memory section is a list of objects that are to be entered into the working memory. If an object has one or more elements, those elements can be initialized to user specified values. Numeric values that are not speci- fied are initialized to zero and string values that are not specified are initialized to an empty string. The integer- literal that may precede the name (identifier) of the object specifies how many objects of that type are to be added to working memory with the given element values, e.g.: /* definition part */ - 8 - A (A1 : STRING A2 : FLOAT) B (B1 : STRING B2 : FLOAT) %% /* short term memory */ A ( A1 => "string") /* It is not necessary to initialize all the elements in an object. */ 2 A ( A2 => 3.34 A1 => "thing") /* Nor is it necessary to initialize elements in the order they were declared. */ 3 B /* It is not necessary to initialize the elements at all. */ %% _6. _L_O_N_G _T_E_R_M _M_E_M_O_R_Y Long term memory is the section where the situation/action rules are enumerated. This section may begin with a listing of options that are to be turned on. All options in this section can also be specified by command line flags. Since the syntax for the long term memory sec- tion is more complex than for the other sections, it will be presented in several parts. _6._1. _O_P_T_I_O_N_S The long term memory is composed of two sections, the options and the rules. ltm ::= { [option] } { rule } option ::= ZERO | PROFILE | BACKTRACK | DUMP | RECURS | NORECURS | SAVE | TRACE | PREFIX identifier _6._1._1. _Z_E_R_O The ZERO option directs the compiler to generate a pro- cedure that will free all the dynamic structures allocated by TRC generated code. This feature is useful when develop- ing inference engines that will be entered more than once. It is often necessary to remove the 'leftovers' from a pre- vious execution before beginning a new execution. - 9 - _6._1._2. _P_R_O_F_I_L_E The PROFILE option directs the compiler to generate code to profile the execution of the inference engine and a procedure to print a summary of that profile. The profiling code counts the number of times that each rule searches some part of STM and how many times each rule is fired. _6._1._3. _B_A_C_K_T_R_A_C_K The BACKTRACK option directs the compiler to generate an inference engine that will backtrack when no rule is true. Backtracking is accomplished by undoing the actions of the last rule that fired and continuing to test rules as if the undone rule had never fired. _6._1._4. _D_U_M_P The DUMP option directs the compiler to generate pro- cedures that will print the contents of STM on the standard output. The intention of this option is to simplify the process of developing and debugging rules. By having the DUMP procedures generated automatically, the knowledge engineer is freed of the mundane task of writing procedures to display the current state of the STM. The DUMP pro- cedures are not intended to serve as the output of an expert system. Appropriate output routines will have to be developed by the knowledge engineer after the rules have been written. _6._1._5. _R_E_C_U_R_S TRC will generate code that uses one of two strategies for searching STM. These strategies (detailed in section 6.3.3) are called LINEAR and RECURSIVE. The LINEAR strategy is the default. The RECURS directive in the option part directs the compiler to use the RECURSIVE strategy as the default. It is possible to override the default on a per rule basis. Overriding the default is discussed in section 6.3.3. _6._1._6. _N_O_R_E_C_U_R_S The NORECURS option directs the compiler to use the LINEAR search strategy in all rules, unless otherwise directed. Since this is the default condition, it is not necessary to use this option. _6._1._7. _S_A_V_E The SAVE option directs the compiler to generate pro- cedures to save all objects which are dynamically allocated by TRC code on a file. The compiler will also generate pro- cedures which can restore the dynamically allocated - 10 - structures from the previously written files. The intention of this option is to simplify the development of expert sys- tems with checkpointing and restarting capabilities. The procedures generated by this option and the use of those procedures is described in section 13.6 and Appendix C. _6._1._8. _T_R_A_C_E The TRACE option directs the compiler to trace the exe- cution of the inference engine by maintaining a list of the rules that have been fired in the order they were fired. This list can be used to produce an explaination of the actions taken by the expert system. _6._1._9. _P_R_E_F_I_X The PREFIX option directs the compiler to use the iden- tifier that follows the reserved word 'PREFIX' as a prefix for all data objects and procedures generated by TRC. The intention of this option is to facilitate building expert systems that have more than one inference engine. Supplying different prefixes for each inference engine insures that there will be no name conflicts between separate inference engines, e.g.: PREFIX X_ _6._2. _R_U_L_E_S The second section if LTM is the list of rules. Each rule has a label, which can be supplied or automatically generated by TRC. The label is used whenever it is neces- sary or convenient to refer to the rule by name. The label is followed by the situation part (described in the next section). The situation part is a list of statements fol- lowed by the arrow delimiter. The action part (described in the section following the description of the situation part) follows the arrow delimiter and is itself a list of state- ments. The action part is followed by a semicolon which terminates the rule. rule ::= label situation arrow action semicolon label ::= identifier colon | colon _6._3. _S_I_T_U_A_T_I_O_N The situation part specifies how STM is to be searched and what must be present in STM for the situation part to be true. situation ::= { [ s-option ] } { [ match ] } s-option ::= EMPTY identifier identifier - 11 - s-option ::= RECURS | NORECURS match ::= [ integer-literal ] identifier match ::= NOT identifier match ::= [ integer-literal ] left-paren name match-list right-paren match ::= c-code name ::= hat identifier identifier match-list ::= { match-item } match-item ::= identifier dot identifier relop literal match-item ::= identifier dot identifier relop identifier dot identifier relop ::= equality | not-equal | less-than relop ::= greater-than | greater-than-or-equal relop ::= less-than-or-equal _6._3._1. _M_A_T_C_H_I_N_G It is necessary to understand how matching is specified before the s-option part can be explained. A match, which will also be referred to as a test, is a statement of what the inference engine is to search for in STM. Assume the following objects were defined in the definition section: %% A (A1 : INT A3 : INT A2 : STRING) B (B1 : INT B2 : STRING) %% The simplest match specifies only the object that must be present. A search for one object of type A and one object of type B can be specified as follows: A B A search for two objects of type A and two objects of type B can be specified in many ways, including these four equivalent ways: A A B B 2A 2 B A B A B A A 2B The objects can be listed in any order and may be pre- ceded by an integer literal. The integer literal specifys how many objects of the named type are to be search for. In one of the examples there is a space between the count and the object name and in other examples there is no space - 12 - between the count and the object. Spaces are required only when there would be a conflict without a space. Since the string "2A" (for example) begins with a digit, it is presumed to be a numeric literal. Since "A" is not a digit, the numeric literal ends at that point. Since the numeric literal contained no decimal point, it is an integer- literal. The string is therefore lexically equivalent to "2 A". The reserved word NOT is used to explicitly test for an empty list. The following match statement will be true if there are no objects of type A in STM: NOT A Any rule which contains a search for an object and a test for that same list being empty can never be true. TRC generates an error message in this situation because even though it is syntactically correct, it is in fact meaning- less, e.g.: A NOT A Very often it is necessary to search STM based not only on the type of object, but also based on the values of the elements of the object. This is specified by placing a list of the element values after the element name. (A.A1 == 2) (B.B1 != 3 B.B2 <= "THIS") (A.A2 == "HERE" A.A1 > 6) These three statements can be translated as follows: First search the A list for an object whose element A1 is equal to two, then search the B list for an object whose element B1 is not equal to three and whose element B2 is less than or equal to "THIS", finally search the A list for an object whose element A2 is equal to "HERE" and whose ele- ment A1 is greater than six. This situation part would be true if all three objects were found in STM, otherwise it would be false. In the first match only the value of A1 is specified. Only the elements that are specified are tested, the values of any other elements that the object may contain are not considered. Association of parameters is by name, so it is not necessary to list elements in the order they were declared. The third match statement in the example above lists the value of element A2 first, even though ele- ment A1 was declared first. The final case that must be considered is the case where it is necessary to search STM for an object whose ele- ments are to be tested against the result of some previous - 13 - search. To do this it is first necessary to name the object that is being searched for so that it may later be referred to, e.g.: (^A FIRST A.A1 == 2) (B.B1 == FIRST.A3) (A.A1 != A.A3) The first statement begins with a hat character. This indicates that this object is to be named. The hat charac- ter is followed by the object type and it's name. The name is followed by a list of the elements to search for, in this case a search for element A1 equal to two is specified. This statement can be translated as follows: Search the A list for an object whose element A1 is equal to two and name that object "FIRST". A name that is applied to an object is called a free variable. The scope of a free variable is the current rule. Free variable names can be reused in subse- quent rules. The second statement specifys that the B list is to be searched for an element B1 whose value is equal to the value of the element A3 found in the previous statement. The free variable name makes it possible to refer to previously found elements. The third statement, while looking innocent enough, is radically different from all previous examples. In all the previous examples the exact value that was being searched for was known before the search began. That value was expressed as either a literal, or the value of some element that was found in a previous test. In the third statement, the A list is being searched for an object whose elements A1 and A3 are not equal to one another. The values these ele- ments are to have are not specified, only their relationship to one another. This can be further complicated: (^A Second A.A1 == 3) (A.A1 < Second.A3 A.A3 < A.A1) In the second match statement of this example A1 is being compared to the value of A3 in the object named "Second" and it is being compared to the value of the ele- ment A3 in the object that contains it. An element may appear on the left hand side of the relational operator only once in a given match statement. It is now possible to con- sider the effects of the options. - 14 - _6._3._2. _O_P_T_I_O_N_S The situation part begins with a (possibly empty) list of options. The reserved words RECURS or NORECURS may appear in the option part of the situation. The appearance of one of these words causes the named strategy to be used rather than the current default strategy. It is not an error to explicitly specify the default strategy, but it is unnecessary. The option part of the situation may also include EMPTY statements. An EMPTY statement is a static object declaration. The intention of the EMPTY statement is to provide a means of passing data from STM to embedded c- code and from embedded c-code to STM. Examples in section 12 will illustrate these actions. _6._3._3. _S_E_A_R_C_H _S_T_R_A_T_E_G_I_E_S A small example provides an easy way to illustrate the two search strategys. This example is a complete TRC specification, though not useful for anything other than as an example. %% PENNY (MINT : STRING DATE : INT) %% PENNY (MINT => "DENVER" DATE => 1964) PENNY (DATE => 1966) %% R1: (^PENNY First) (PENNY.DATE <= First.DATE) => MARK PENNY ; %% STM will be initialized to contain two objects of type PENNY, the first minted in Denver in 1964, the second minted in an unspecified location in 1966. Since the reserved word RECURS does not appear in either option section, the default LINEAR search strategy will be used. In the LINEAR strategy, STM is searched in a linear fashion for each object specified in the situation part. Objects are searched for in the order they are listed. In this example, the object named "First" will be associated with the first object in the list. Since the values of the elements are not specified, any object of type PENNY will match. This object is then temporarily marked as being "in use" and can not be used to match any subsequent tests. The list will then be searched for an object of type PENNY whose DATE element is less than or equal to the DATE element of the "First" object. The only other object in the list has a - 15 - DATE element of 1966 which is not less than or equal to 1964, so the rule fails. In the LINEAR strategy, when any test in the situation part fails, the entire rule fails immediately, no further tests are made. It should be obvious that this rule would have been true if "First" had been associated with the second object in the PENNY list. This is precisely the purpose of the RECURSIVE search strategy. In the RECURSIVE search stra- tegy, when a test fails, the previous test is redone. To redo a test, the object that was marked as "in use" is unmarked, and the list is searched from that point for an object that will match the test. The RECURSIVE search fails when a single test fails and it is no longer possible to undo the previous test (this occurs when there is no previ- ous test). The RECURSIVE search strategy is a powerful pat- tern matching tool, but it can be expensive in terms of exe- cution time. _6._3._4. _E_M_B_E_D_D_E_D _C_O_D_E Arbitrary C language code may be embedded in the situa- tion part anywhere a match may occur. Recall that embedded code (c-code) is recognized as a single lexical element by the scanner, the C language itself is not parsed by TRC. Errors in embedded code will not be detected by TRC. The intention of permitting embedded code in the situation part is to make it possible to include tests that may not fit the context of a match against STM. In order to integrate an embedded code test with the existing match statements, it is necessary to have a way to refer to objects in embedded code. In order for embedded code to have the same functionality as a match statement, it is necessary to have a way to cause a rule to fail in the embedded code. Each of these facilities are provided. _6._3._5. _E_M_P_T_Y _O_B_J_E_C_T_S The purpose of the EMPTY statement is to create a named object that can be referred to by match statements and embedded code, without having to exist in STM. One of the capabilities that results is the ability to have STM searched on the basis of the result of some external pro- cedure, e.g.: R1: EMPTY PENNY SPARE /* this creates an object of type PENNY that is named SPARE. This object exists separately from STM and it's elements are not initialized. */ { /* this embedded C code precedes - 16 - any search of STM */ if(($SPARE.DATE = external-procedure()) <= 1920){ $FAIL. } } (PENNY.DATE == SPARE.DATE) => MARK PENNY ; Several things are happening in this example. First an object of type PENNY is created and given the name SPARE. This object exists separately from STM and will exist only during the current rule. It is useful only as something that can be referred to in other statements. A section of embedded code precedes the only match statement in this rule. When the code produced by TRC is compiled and run, that embedded code will be executed before STM is searched, by virtue of the fact that it precedes the match statement. The embedded code contains an "if" statement which con- tains an embedded assignment and function call as part of it's condition. The left-hand-side of the embedded assign- ment, "$SPARE.DATE" is not syntactically correct C language code. The dollar character is a flag to TRC that indicates a reference to a named object. The identifier that follows the dollar character will be translated by TRC during the output phase. This translation is described in section 12. The statement "$FAIL." is translated by TRC into whatever statements are required to make this rule fail. The defini- tion of failure depends on the search strategy. If the LINEAR strategy is being used, "$FAIL." will cause the rule to stop searching STM and continue with the next rule. If the RECURSIVE strategy is being used, "$FAIL." will cause the rule to undo and then redo the previous match statement. An object name preceded by the dollar character may occur in the embedded code anywhere a variable name may occur, since that is what it will actually be translated to. Embedded code may also refer to objects that exist in STM using the same dollar character translation technique: R1: RECURS (^PENNY NEW PENNY.MINT == "DENVER) { if(some-function($NEW.DATE)) $FAIL. else $NEW.DATE = 0; } (PENNY.DATE == 1921) - 17 - => MARK PENNY ; The "else" part of the embedded code illustrates an assignment to an element of the object named NEW. The object that is being called NEW exists in STM and this assignment to it's DATE element is permanent. Since this rule is recursive, it is possible that this embedded code will set the DATE element of every object in the PENNY list to zero. These modifications are made before it is even known that the situation part is true. Modifying STM in the situation part of a rule would be a major departure from traditional expert system implementation techniques. Furth- ermore, the BACKTRACKing option is unaware of changes made in STM by embedded code. The BACKTRACKing option is unable to correctly undo this rule. _6._4. _A_C_T_I_O_N The ACTION part specifies what is to be done if the situation part is true. The actions that can be taken pri- marily involve adding objects to STM or deleting objects from STM. Recall that the non-terminal 'entry' was defined in section 4. action ::= statements c-code statements ::= { [statement] } statement ::= MARK mark-list statement ::= ADD add-list statement ::= OPTIMIZE identifier mark-list ::= { [ mark-item ] } mark-item ::= [ integer-literal ] identifier add-list ::= { [ entry ] } _6._4._1. _M_A_R_K The MARK statement is used to delete objects from STM. Only objects that were found in the situation part may be deleted. The reason for this constraint is that only the objects found in the situation part are definitely known to exist in STM. STM is searched only in the situation part, there is no searching in the action part. Objects may be deleted by name or in the order they were found, e.g. (using the definitions from section 6.3.1): R1: (A.A1 != A.A3) (^A FIRST A.A1 == 2) (B.B1 == FIRST.A3) => MARK A ; - 18 - This MARK statement will delete the object in the A list that met the test (A.A1 != A.A3). In some instances it may be desirable to delete an object that was not the first object that was found, e.g.: R1: (A.A1 != A.A3) (^A FIRST A.A1 == 2) (B.B1 == FIRST.A3) => MARK FIRST ; The A list object named 'FIRST' is the second object of type A to be found. It is specified as the object to delete by using it's free variable name. A MARK statement can specify a count of how many objects of a given type are to be deleted. A MARK statement may list any number of objects to delete, and each object to be deleted can have a separate MARK statement if desired. In no case can more objects be deleted than were found in the situation part. Each of the following examples is equivalent: R1: (A.A1 != A.A3) (^A FIRST A.A1 == 2) (B.B1 == FIRST.A3) => MARK B FIRST A ; R1: (A.A1 != A.A3) (^A FIRST A.A1 == 2) (B.B1 == FIRST.A3) => MARK 2A MARK B ; R1: (A.A1 != A.A3) (^A FIRST A.A1 == 2) (B.B1 == FIRST.A3) => MARK FIRST MARK A MARK B ; - 19 - _6._4._2. _A_D_D The ADD statement is used to add new objects to STM. As in the MARK statement, an ADD statement can specify one or several objects to add to STM. The value of each element of each object can be specified as in the STM section of the specification. Each object is inserted at the head of the appropriate list. The insertions are actually made in the opposite order that they are listed, the net effect is that the objects appear at the head of the list in the order they are specified. ADD and MARK statements may be intermixed in any order, e.g.: R1: (A.A1 != A.A3) (^A FIRST A.A1 == 2) (B.B1 == FIRST.A3) => MARK FIRST ADD A (A.A1 => 6 A.A3 => FIRST.A3) ADD B (B.B2 => FIRST.A2 B.B1 => 9) MARK B ; All the ADD statements will be executed before any MARK statements are executed regardless of the order of the statements in the action part. The statements are ordered by the compiler to insure that an ADD statement does not refer to an object that has already been MARKed. In the example above, the first ADD statement refers to the object named 'FIRST'. The object named 'FIRST' is MARKed in the previous statement. If the code were executed in the speci- fied order, the element 'FIRST.A3' would not exist when the ADD statement was executed. _6._4._3. _O_P_T_I_M_I_Z_E The OPTIMIZE statement is named for it's primary func- tion, hand optimization of LTM. There is also a built in optimizer that can be invoked. Optimization is discussed in detail in section 7. The OPTIMIZE statement can be thought of as an unconditional GOTO statement. In normal execution, after a rule fires the rules are tested from the beginning of LTM for the next rule that will fire. The OPTIMIZE statement can specify a point other than the start of LTM to begin testing rules. In addition to optimization, it can be used to impose a customized control structure on the set of rules. One example of the use of the OPTIMIZE statement is to implement a search for the absence of some object(s) in STM, - 20 - which is not otherwise supported by the TRC language. To search for the absence of some object(s), use two rules. The first rule searches for the presence of the object(s) in question, if the rule is true then the object(s) are not absent. If the rule fails, the object(s) are absent, e.g.: R1: /* effectively search for the absence of an object A with element A1 == 2 */ (A.A1 == 2) => /* If this rule is true, branch around the next rule */ OPTIMIZE R3 ; R2: /* If R1 fails, then there is no object A with element A1 == 2. An empty situation part such as this always evaluates to true */ => /* whatever you wish to do in response to the absence of A1 == 2 */ ; R3: /* continue here if R1 is true */ . . . _6._4._4. _C-_C_O_D_E A c-code may follow the MARK, ADD and OPTIMIZE state- ments. This is user code that is to be executed when a rule fires. C-code may not appear between MARK, ADD or OPTIMIZE statements. If it is necessary to refer to an object that is being MARKed in c-code, it should be done in the situa- tion part. A c-code may precede the arrow symbol that separates the situation and action parts. C-code in this position is equivalent to c-code in the situation part preceding the MARK, etc. statements, e.g.: R1: (^A FIRST) { /* this c-code follows all situation tests. It is effectively in the action part since it will execute only if the situation is - 21 - true */ some_procedure($FIRST.A1); } => MARK FIRST ; _7. _O_P_T_I_M_I_Z_E_R The optimizer does not produce code that is optimum in any sense. What it does is to perform a single, very useful code modification that can have a very positive impact on execution time. Consider the execution of an inference engine. Each rule is tested until one who's situation part is true is found. This rule's action part is then executed. When rules are being tested the problem space is being searched. When an action part is executed a step is taken in the solu- tion of the problem. Searching the problem space is clearly part of the solution, but the action part is where the the results occur. The goal, which is not attained, is to reduce the search time to zero. To attain this goal it would be neces- sary to know each time a rule fires which rule will fire next. This is generally not known. In particular when the inference engine begins execution, the contents of STM are not known, any rule can be the first rule to fire. Once a rule has fired and each time any rule fires a great deal of implicit knowledge about the contents of STM is obtained. It is known that no rule previous to the current rule is true and no rule previous to the current rule can be true after the execution of the current rule unless the current rule modifies STM in such a way as to make some previous rule true. This simple fact is the entire basis of the optimizer, which attempts to reduce the number of rules that are tested by deducing which rules can not possibly fire. Three tests must be performed to determine a candidate next rule, which is the first rule in LTM that can possibly fire after the current rule fires. The three tests are called the NOT test, the ADD test and the MARK test. The first case to be considered is the case of a rule which contains a NOT statement in the situation part. A NOT test is an explicit test for an empty list. When a rule that fires contains an ADD statement it will not be possible for any previous rule with a NOT statement referring to that list to be the next rule to fire. Likewise, if a rule that fires contains a MARK statement and no ADD statement refer- ring to that same list, it is possible that the list will - 22 - become empty making it possible for the rule with the NOT statement that previously failed to become true. If it is determined that it is possible for a rule to fire after the NOT test, that rule becomes the candidate rule and no further testing is done. Consider the case of a rule with no NOT statements that recursively searches STM for a situation. If this rule fails, it will continue to fail until something is added to STM to make it true. If all rules searched STM recursively it would be known when a rule fires that of the rules that precede the current rule, only those rules that search for something added to STM by the current rule can possibly fire in the next pass. If the current rule adds something to STM, control could continue with the first rule that searches for that something rather than the first rule in LTM. If no rule prior to the current rule searches for those things added to STM by the current rule or if the current rule adds nothing to STM then no prior rule can execute. Control could con- tinue with the current rule rather than at the beginning of LTM. By causing control to continue with a rule later than the first rule the amount of searching is reduced. The case of a rule that performs only a linear search on STM must also be considered. The previous conclusion about items being added to STM is still true; a rule that adds something to STM can cause a linear search rule to become true. With linear search it is also possible that a rule will become true if something is removed from STM. If a linear rule searches for several similar items which are present but not correctly ordered it is possible for this linear search to fail where a recursive search would not have failed. If there were excess items, removing one or more may cause a different linear assignment which could make a linear rule true. This is the MARK test. Examples of this situation are non-trivial, but where correctness is an issue these cases can not be overlooked. The TRC optimizer selects a continuation point for each rule based on what the rule adds to or deletes from STM rather than testing each rule from the beginning of LTM. The continuation point is the first rule that could fire based on the NOT and ADD tests for all rules, and the MARK test for linear rules. The TRC optimizer is somewhat naive in that it considers only items added or deleted with the ADD and MARK statements. The optimizer is unaware of any changes that may have been made to STM by user code. The caveat is if STM is modified in user code the optimizer may produce incorrect code. The optimizer, which can be invoked with a command line option (-O), tests each rule individu- ally and ignores those rules that were hand optimized in the specification.