
		   Symbol Table Organization
		         June 13, 1985


The symbol table has been reorganized to accommodate the more complex
data types and provide a better separation between unqualified and
qualified names.  Files `srsym.c' and `sr.h' have been changed
to have the following specifications.



DATA STRUCTURES


st -- Main Symbol Table

	Contains all user-defined, unqualified identifiers that are
	visible at the current compilation point.
	Block structured:  starts with built-in identifiers, then
	the block for the component being compiled, then nested blocks.
	Identifiers must be unique within each block.

	st_top, st_end, st_cb
		ends of st, and pointer to top of current block


at's -- Auxiliary Symbol Tables

	Contain auxiliary information needed to fully define entries in st.
	This includes imported components, record fields, operation
	specifications, headers for anonymous types, and headers for
	objects that are pointed to by ptrs.
	Identifiers must be unique within each auxiliary table.

	Auxiliary tables are always reached from the main table, or
	from other auxiliary tables.


nt -- Name Table (formerly the string table)

	Contains all identifiers and literal strings in the interfaces
	file and in components being compiled.
	Organized as a hash table.


symbol -- Structure for entries in st and at's

	name -- pointer to string table entry containing the name
		of the identifier or literal

	kind -- kind of entry; enumeration constant of form K_*

	type -- data type of entry; enumeration constant of form T_*

	tdef -- pointer to symbol entry containing information defining
		the type; used whenever the type is not a basic one
		(i.e., bool, int, or char).

	restrict -- restrictions for variables, type declarations,
		    operations, and parameters;
		    enumeration constants of form R_*.
		    
	value -- pointer to node containing initial value, if any.

	ranges -- pointer to range list, which points to nodes giving
		  lower and upper range bounds.

	segment -- logical segment where storage is to be allocated;
		   enumeration constant of form S_*; filled in at start
		   of code generation, except for op declarations.

	offset -- integer offset in logical segment; filled in at
		  start of code generation.

	size -- size of object.
		number of array elements during parsing;
		number of bytes of storage during code generation.



local_block -- Chain of blocks in component being compiled

	lb_top, lb_end -- top and end of local blocks
	nest_level -- current nesting level


free_symbol, free_range -- reclaimed symbols and ranges;
			   updated at end of each component.



ROUTINES (externally used ones only)


init_sym() -- initialize the name table;
	      seed st with pre-defined functions.

tidy_sym() -- update interfaces file, clean up tables, and
	      release storage at end of each component.

pop_block() -- pop current block so symbols are no longer in main table;
	       called at end of a nested block.
	       entries are still reachable via local_block.

struct symbol *st_install(name,kind) --
		add new entry for name to main symbol table;
		initialize kind field.
		returns NULL if name is not unique in current block.

struct symbol *st_lookup(name) --
		looks name up in main symbol table; searches backward.
		returns NULL if name not found.

struct symbol *at_install(table,name,kind) --
		add new entry for name to auxiliary table, which is
		created if table == NULL.
		initialize kind field.
		returns NULL if name is not unique in the table.

struct symbol *at_lookup(table,name) --
		looks name up in an auxiliary table; searches forward.
		returns NULL if name not found.

add_range(sym,nd) --
		append nd to ranges field of symbol sym.
		nd is a pointer to an expression tree node.

char *nt_lookup(name,install_flag) --
		looks up string name in string table.
		returns NULL if not found,
		or installs it if install_flag is 1.

print_nt() -- print name table for debugging.

print_st() -- print symbol tables for debugging.



SYMBOL TABLE ENTRIES (by s_kind field; undescribed fields are not used)


K_BLOCK -- used for globals, specs, bodies, procs, and command bodies

	s_type == T_GLOBAL or T_SPEC or T_BODY
		s_name points to the component name
	s_type == T_PROC
		s_tdef points to the op declaration
	s_type == T_CMD


K_IMPORT -- used for imports

	s_name points to the name of the imported component
	s_tdef points to an auxiliary table containing the imported symbols


K_TYPE -- used for type declarations

	s_name points to the type identifier
	s_type == T_ENUM
		s_tdef points to the first enumeration literal;
		the enumeration literals are stored in consecutive
		entries in the main symbol table
	s_type == T_REC
		s_tdef points to the head of an auxiliary table containing
		the field definitions
	s_type == T_CAP
		s_tdef points to the prior st entry containing the resource,
		channel or operation name, or points to the head of an
		auxiliary table containing the definitions of the parameters
		and result
	s_type == T_PTR
		s_tdef points to an entry for the type being pointed to;
		this is a prior st entry for user-defined types, or
		a one-element auxiliary table for built-in and
		anonymous types (in the latter case that entry points
		an auxiliary table giving the definition of the type)
	s_restrict gives the type restriction (R_PUBLIC or R_PRIVATE)


K_CONST -- used for constant declarations

	s_name points to the name of the constant
	s_value points to an expression tree giving the value
	s_type gives the type of the constant; it is one of
		T_BOOL, T_CHAR, T_INT, T_BINARY, T_STRING,
		T_ENUM, T_RECORD, T_CAP, or T_PTR
	s_tdef has the same interpretation as for K_TYPE for
		user-defined and anonymous types


K_VAR -- used for variable declarations and record fields

	s_name points to the name of the variable
	s_type and s_tdef give the type of the variable, as for constants
	s_value points to a node giving the initial value, if any
	s_ranges points to a list of ranges if the variable is subscripted
	s_size gives the number of array elements if the variable is
		subscripted; this value is set to SIZE_UNK (-1) if
		the size cannot be determined at compile-time.


K_CHAN -- used for channel declarations

	s_name points to the name of the channel
	s_type is T_VOID or T_FUNC depending on whether the channel
		returns a result (i.e., is a function)
	s_tdef points to an auxiliary table containing the parameter
		and result definitions
	s_restrict is one of R_CALL, R_SEND, or R_CALLSEND


K_OP -- used for operation declarations

	s_name points to the name of the operation
	s_type, s_tdef, and s_restrict have the same interpretation
		as for K_CHAN entries
	s_ranges and s_size have the same interpretation as for K_VAR entries
	s_segment is set to S_PROC if the operation is later seen to
		be implemented by a proc, or to S_INPUT if the operation
		is later seen to be implemented by input statements.


K_PARAM -- used for operation parameters

	s_name points to the name of the parameter
	s_type, s_tdef, s_ranges, and s_size have the same interpretation
		as for K_VAR fields except that s_size is set to SIZE_ARB
		if the size is not fixed (e.g., one range bound is `*')
	s_restrict is one of R_VAL, R_RES, or R_VALRES


K_RESULT -- used for operation results

	s_name points to the name of the result
	s_type, s_tdef, s_ranges, and s_size have the same interpretation
		as for K_PARAM fields


K_LITERAL -- used for enumeration literals

	s_name points to the name of the literal
	s_type is T_ENUM
	s_tdef points to the first literal of this enumeration type

	we have not yet decided when the literal's value is determined,
	or where it is stored


K_ANON -- used for anonymous types and pointers to built-in types

	s_type == T_INT, T_BOOL, or T_CHAR
		no other fields are used

	otherwise the fields are set the same as for K_TYPE entries.

	K_ANON entries never appear in the main symbol table;
	they are always one-element auxiliary tables.


K_PREDEF -- used for generic predefined functions

	s_name points to the name of the function


K_FREE -- used for entries that have been freed



-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
		some notes added later in 1985:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------


K_BLOCK

	s_type == T_INIT
	s_type == T_FINAL
		these two are like T_PROC but s_tdef is not used.
	s_type == T_INPUT
		s_tdef points to the op declaration.
	s_type == T_CMD
		used only for do, if, and fa.

	s_type == T_BODY has disappeared.
	s_type == T_SPEC
		(add) size is number of resource parameters.
-----------------maybe if we get around to this.


K_TYPE
	s_type == T_ENUM
		(add) s_size is the number of literals in this type.

--------------------------
K_VAR
	s_type == T_ENUM
		s_tdef points to name of enum type (not first literal)

--------------------------
K_PREDEF

	s_type == T_PREDEF is used for the predefined block

	s_type == T_FUNC	
	s_type == T_VOID
		other fields just like for K_OP

	note: a predefined type is a K_TYPE.
		code generator needs special flag for K_PREDEF.
		(also, can't assign predef's to caps.)

--------------------------
K_IMPORT
	s_type == T_SPEC
	s_type == T_GLOBAL
		(not really necessary, but convenient.)

--------------------------

a range contains:

	bounds1 and bounds2 -- pointers to TK_RANGE nodes
		for first and second bounds.
		if second bound unused, it is NULL.

	the left and right pointers in a TK_RANGE node
	point at the nodes for the lower and upper bounds, respectively.
	if there is no upper bound, right is NULL.
	a star is represented as a TK_ARB node.

add_range(sym,bounds1,bound2)

How about getting rid of add_range and just put both in st?

--------------------------------------------------------------------

for a record (or enum) variable, the s_tdef field points at the
type name entry in the st, not at the aux table as the
documentation indicates.

e.g., type R = rec(a:int)
      var x:R

x's tdef points at R, not the aux table
