.sh 1 "Implementation Overview"
.pp
The current SR implementation is built on top of UNIX
and consists of three major components:
the compiler, linker, and run-time support.
Below we describe
how SR programs are built and executed using these components
and then how the major language features are implemented.
We conclude by giving the status of the current implementation
and some measurements of its size and performance. 
The same basic philosophy that guided the design of the language
has guided its implementation:  common uses of language
features should have a simple, efficient implementation.
.pp
The compiler has a traditional internal structure:
a lexical analyzer, recursive-descent parser, and code generator.
The lexical analyzer and parser employ common techniques.
The code generator emits C source code, which is
passed through the C compiler to produce machine code (MC).\**
.(f
\**Originally the Amsterdam Compiler Kit (ACK) [Tane83]
was used for code generation.
We switched to generating C so people who want to
use our compiler would not have to purchase ACK.
.)f
The SR compiler supports separate compilation
of entire resources, resource specs, resource bodies, and globals.
.pp
The linker provides the means by which the user constructs
a program from previously compiled resources.
The input to the linker is a list of resources
and a list of physical machines on which the program is to execute.
The linker parses and verifies the legality of its input
(e.g., it checks to make sure that the resources
has been compiled in an acceptable order)
and then uses the standard UNIX linker to create a load module.
The input to the linker also designates
one of the physical machines as the program's \*(lqmain\*(rq physical machine
and one of the resources as the program's \*(lqmain\*(rq resource.
A virtual machine is created on the main physical machine
when the program begins.
One instance of the main resource
is then created, and begins execution, within that virtual machine.
Each virtual machine (VM) executes as a single UNIX process
in which concurrency is simulated by the run-time support.
VM's exchange messages using UNIX sockets.
.pp
The run-time support (RTS)
provides the environment in which the MC executes.
The RTS provides primitives for resource creation and destruction,
operation invocation and servicing,
and memory allocation;
it also supports the implementation-specific language
mechanisms described in Sec. 2.4.
Internally, the RTS contains a #nugget:
a small collection of indivisible process management
and semaphore primitives.
The RTS hides the details of the network from the MC;
i.e., the number of machines and their topology
is transparent to the MC.
When the RTS receives a request for a service
provided on another machine\(eme.g.,
create a resource or invoke an operation\(emit simply
forwards the request to the destination machine.
Upon arrival at that machine, the local RTS
processes the request just as though it had
been generated locally.
Results from such requests are transmitted back in a similar fashion.
.sh 2 "Resource Creation and Destruction"
.pp
On each VM,
the RTS maintains a table of active resource instances.
A resource capability consists of (1) a VM identity,
a pointer into the resource instance table, and a sequence number
and (2) an operation capability for each of the operations declared
in the resource's spec
(operation capabilities are described below in Sec. 4.2).
The sequence number for a resource is assigned when the instance is created;
it is stored in the resource instance table.
The RTS uses sequence numbers to determine
whether a resource capability refers
to a resource instance that still exists;
i.e., whether the referenced resource instance has been destroyed.
.pp
The MC for the %create statement builds a creation block
that contains the identity of the resource to be created,
the VM on which it is to be created,
and the values of any parameters.
This block is passed to the RTS,
which transmits it to the designated VM.
When the creation block arrives at the designated VM,
the (local) RTS allocates a table entry for the instance
and fills in the first part of the resource capability accordingly.
The RTS then creates a process to execute the resource's initialization code.
.pp
The MC for every resource includes initialization code
even if there is no user-specified initialization code.
The key functions of such code are to allocate memory for resource variables
(the size of which may depend on the parameters in the resource heading),
to initialize resource variables
that have initialization expressions as part of their declaration,
and to create operations declared in the resource spec
or outer level of the body.
To accomplish operation creation,
the MC interacts with the RTS.
For each operation that is being created, the RTS
allocates and initializes an entry in its operation table
(see Sec. 4.2);
if the operation is in the resource's specification,
the RTS also fills in the appropriate field
in the resource capability that will be returned from %create.
The initialization process executes this implicit initialization code,
then any user-specified initialization code,
and finally additional implicit initialization code
to create any background processes in the resource.
.pp
To destroy a resource instance,
the MC passes the RTS a capability for the instance.
If the resource contains finalization code,
the RTS creates a process to execute that code.
When that process terminates, or if there was no finalization code,
the RTS uses the resource instance table to
locate processes, operations, and memory that belong to
the resource instance.
The RTS then kills the processes,
frees the entries in the resource and operation tables,
and frees the resource's memory.
The sequence number in each freed entry
is incremented so that future references
to a resource that has been destroyed or to one of its operations
can be detected as being invalid.
.pp
When an SR program begins execution,
first the nugget and then the RTS initialize themselves.
Then an instance of the main resource is created
much in the same way that any other resource instance is created.
.sh 2 "Operations"
.pp
The RTS also maintains an operation table on each VM.
This table contains an entry for each
operation that is serviced on that VM and is currently active.
The entry indicates whether the operation is serviced by a %proc
or by input statements.
For an operation serviced by a %proc,
the entry contains the address of the code for the %proc.
For an operation serviced by input statements,
the entry points to its list of pending invocations.
An operation capability consists of a VM identity,
an index into the operation table,
and a sequence number.
The sequence number serves a purpose analogous
to the sequence number in a resource capability:
it enables the RTS
to determine whether an invocation refers to an operation
that still exists.
(An operation exists until its defining resource is destroyed
or its defining block terminates.)
.sh 3 "Invocation Statements"
.pp
To invoke an operation,
the MC first builds an invocation block,
which consists of header information and actual parameter values.
The MC fills in the header with the kind of invocation
(%call, %send, concurrent %call, or concurrent %send)
and the capability for the operation being invoked.
Then,
the MC passes the invocation block to the RTS.
If necessary, the RTS transmits the invocation block
to the VM on which the operation is located
(recall that capabilities contain VM identities).
The RTS then uses the index in the operation capability
to locate the entry in the operation table,
and thus determine how the operation is serviced.
For an operation serviced by a %proc,
the RTS creates a process and passes it the invocation block.\**
.(f
\**In some cases, the RTS can avoid creating a process;
see Sec. 4.2.3 for details.
.)f
For an operation serviced by input statements,
the RTS places the invocation block onto the list of
invocations for the operation;
then it determines if any process is waiting for the invocation,
and, if so, awakens such a process.
In either case, for a call invocation the RTS blocks the
calling process; when the operation has been serviced that
process is awakened and retrieves any results from the invocation block.
.pp
The implementation of %co statements builds on the
implementation of %call and %send statements.
First, the MC informs the RTS when it begins executing a %co statement.
The RTS then allocates a structure in which it maintains
the number of outstanding call invocations
(i.e., those that have been started but have not yet completed)
and a list of call invocations that have completed
but have not been returned to the MC.
Second, the MC performs all the invocations without blocking.
For each call invocation the MC places an #arm #number\(emthe
index of the concurrent command within the %co statement\(emin
the invocation block.
Third,
since send invocations complete immediately,
the MC executes the post-processing block (if any)
corresponding to each send invocation.
The MC then repeatedly calls an RTS primitive
to wait until call invocations complete.
For each completed call invocation,
the MC
executes the post-processing block (if any) corresponding to the invocation;
specifically, it uses the arm number in the invocation block
as an index into a jump table of post-processing blocks.
When all invocations have completed,
or when one of the post-processing blocks executes %exit,
the MC informs the RTS that the %co statement has terminated.
The RTS then discards any remaining completed call invocations
and arranges to discard any call invocations for this %co statement
that might complete in the future.
The infrequent situation in which a post-processing block
itself contains a %co statement is
handled by a slight generalization of the above implementation.
.sh 3 "The Input Statement"
.pp
The input statement is the most complicated statement in the language
and has the most complicated implementation.
In its most general form,
a single input statement can service one of several operations
and can use synchronization and scheduling expressions
to select the invocation it wants.
Moreover, an operation can be serviced
by input statements in more than one process,
which thus compete to service invocations.
However, as we shall see, the implementation of
simple, commonly occurring cases is quite efficient.
.pp
#Classes are fundamental to the implementation of input statements.
They are used to identify and control conflicts between processes
that are trying to service the same invocations.
Classes have a static aspect and a dynamic aspect.
A static class of operations is an equivalence class of the transitive closure
of the relation ``serviced by the same input statement''.
At compile time, the compiler groups
operations into static classes based on their appearance in input statements.
At run-time,
actual membership in the (dynamic) classes depends
on which operations in the static class are extant.
For example,
an operation declared local to a process
joins its dynamic class when the process is created
and leaves its dynamic class when the process completes execution.
The RTS represents each dynamic class by a #class #structure,
which contains a list of pending invocations of operations in the class,
a flag indicating whether or not some process has access to the class,
and
a list of processes that are waiting to access the class.
Each operation table entry points to its operation's class structure.
.pp
At most one process at a time is allowed to access the list
of pending invocations of operations in a given class structure.
That is, for a given class, at most one process at a time
can be selecting an invocation to service or appending
a new invocation.
Processes are given access
to both pending and new invocations in a class structure
in first-come/first-served order.
Thus, a process waiting to access the invocations
will eventually obtain access as long as all functions
in synchronization and scheduling
expressions in input statements eventually terminate.
.pp
The RTS and nugget together provide seven primitives that the MC uses
for input statements.
These primitives are tailored to support common
cases of input statements
and have straightforward and efficient implementations.
They are:
.ip
.ti -5n
#access(#class)  \(em  Acquire
exclusive access to #class,
which is established as the current class structure
for the executing process.
That process is blocked if another process already
has access to #class.
The RTS will release access when this process
blocks in trying to get an invocation
or when this process executes #remove (see below).
.ip
.ti -5n
#get_invocation()  \(em  Return a pointer
to the invocation block the executing process should examine next.
This invocation is on the invocation list
in the current class structure of the executing process;
successive calls of this primitive return successive invocations.
If there is no such invocation,
the RTS releases access
to the executing process's current class structure
and blocks that process.
.ip
.ti -5n
#get_named_inv(#op_cap)  \(em  Get the next invocation
of operation #op_cap (an operation capability)
in the executing process's current class; a pointer to
the invocation block is returned.
.ip
.ti -5n
#get_named_inv_nb(#op_cap)  \(em  Get an invocation of #op_cap.
This primitive is identical to #get_named_inv
except that it does not block the executing process if no
invocation is found;
instead it returns a null pointer in that case.
It is used when the input statement contains a scheduling expression.
.ip
.ti -5n
#remove(#invocation)  \(em  Remove the invocation block
pointed at by #invocation from the invocation list
of the executing process's current class.
The RTS also releases access
to the executing process's current class structure.
.ip
.ti -5n
#input_done(#invocation)  \(em  Inform
the RTS that the MC has
finished executing the command body in an input statement
and is therefore finished with the invocation block pointed at by #invocation.
If that invocation was called,
the RTS passes the invocation block back to the invoking process
and awakens that process.
.ip
.ti -5n
#receive(#class)  \(em  Get and then remove the next
invocation in #class.
This primitive is a combination of
#access(#class), #invocation:=#get_invocation(), and #remove(#invocation);
hence, it returns a pointer to an invocation block.
It is used for simple input statements and for %receive statements.
.lp
The ways in which these primitives are used by the MC
is illustrated below by four examples.
More complicated input statements
are implemented using appropriate combinations of the primitives.
.pp
Consider the simple input statement:
.(b
%in #q(#x) ->  ...  %ni
.)b
This statement delays the executing process until there
is some invocation of #q, then services the oldest such invocation.
(Note that %receive statements expand
into this form of input statement.)
For this statement, if #q is in a class by itself,
the MC executes #invocation:=#receive(#q's class).
If #q is not in a class by itself,
the MC executes #access(#q's class),
#invocation:=#get_named_inv(#q),
and #remove(#invocation).
In either case,
the MC then executes the command body associated with #q,
with parameter #x bound to the value for #x in the invocation block,
and finally executes #input_done(#invocation).
.pp
Second, consider:
.(b
%in #q(#x) ->  ...  [] #r(#y,#z) ->  ...  %ni
.)b
This statement services the first pending invocation of either #q or #r.
Note that #q and #r are in the same class
because they appear in the same input statement.
Here, the MC first uses #access(#q's class)
and then #invocation:=#get_invocation() to look at each
pending invocation in the class
to determine if it is an invocation of #q or #r
(there might be other operations in the class).
If the MC finds an invocation of #q or #r,
it calls #remove(#invocation),
then executes the corresponding command body with the parameter values
from the selected invocation block,
and finally executes #input_done(#invocation).
If the MC finds no pending invocation of #q or #r,
the executing process blocks in #get_invocation
until an invocation in the class arrives.
When such an invocation arrives,
the RTS awakens the process,
which then repeats the above steps.
.pp
As the third example, consider an input statement with a
synchronization expression:
.(b
%in #q(#x) %and #x > 3 ->  ...  %ni
.)b
This statement services the first pending invocation
of #q for which parameter #x is greater than three.
The MC first uses #access(#q's class) to obtain exclusive access to
#q's class.
The MC then uses
#invocation:=#get_invocation() or #invocation:=#get_named_inv(#q)
to obtain invocations of #q one at a time; the first primitive
is used if #q is in a class by itself, otherwise the second is used.
For each such invocation,
the MC evaluates the synchronization expression
using the value of the parameter in the invocation block.
If the synchronization expression is true,
the MC notifies the RTS of its success by calling
#remove(#invocation),
executes the command body associated with #q,
and calls #input_done(#invocation).
If the synchronization expression is false,
the MC repeats the above steps to obtain the next invocation.
.pp
Finally, consider an input
statement with a scheduling expression:
.(b
%in #q(#x) %by #x ->  ...  %ni
.)b
This statement services the (oldest) pending invocation
of #q that has the smallest value of parameter #x.
In this case, the MC uses the same steps as in the previous
example to obtain the first invocation of #q.
It then evaluates the scheduling expression
using the value of the parameter in the invocation block;
this value and a pointer, #psave, to the invocation block are saved.
The MC then obtains the remaining invocations
by repeatedly calling #invocation:=#get_named_inv_nb(#q).
For each of these invocations,
the MC evaluates the scheduling expression
and compares it with the saved value,
updating the saved value and pointer if the new value is smaller.
When there are no more invocations
(i.e., when #get_named_inv_nb returns a null pointer),
#psave points to the invocation with the smallest scheduling expression.
The MC acquires that invocation by calling #remove(#psave),
then executes the command body associated with #q,
and finally calls #input_done(#psave).
.pp
Note that synchronization and scheduling expressions
are evaluated by the MC, not the RTS.
We do this for two reasons.
First, these expressions can reference objects such as
local variables
for which the RTS would need to establish addressing if it were
to execute the code that evaluates the expression.
Second, these expressions can contain invocations;
it would greatly complicate the RTS to handle such invocations
in a way that does not cause the RTS to block itself.
A consequence of this approach to evaluating synchronization
and scheduling expressions is that the overhead of
evaluating such expressions
is paid for only by processes that use them.
.sh 3 "Optimizations"
.pp
Two kinds of optimizations are applied
to certain uses of operations.
First, for a call invocation of a %proc
that is in the same resource as the caller
and that does not contain a %reply statement,
the compiler generates conventional procedure-call code instead of
going through the RTS, which would create a process.\**
.(f
\**A %proc that executes %reply executes concurrently
with its caller after replying; hence the %proc must execute
as a process in this case.
.)f
The compiler generates code that
builds an invocation block on the calling process's stack
and passes the block's address to the called %proc.
Thus, the code in the %proc is independent
of whether it is executed by the calling process or as a separate process.
A similar optimization is performed
for a call invocation of a %proc
that is located on the same VM as the caller
and that does not contain a %reply statement.
In this case, however,
the RTS must be entered since
the compiler cannot determine whether an operation
in another resource is located on the same VM
as its caller (recall that program linking follows
and is independent of compilation).\**
.(f
\**
In fact, the compiler might not even know whether an operation
is implemented as a %proc
because it might not yet have compiled the body of the resource
containing the invoked operation.
.)f
Also,
the invoking process must create an invocation block
since it is possible that the invoking process might be
in a resource that is destroyed before the invoked %proc completes.
.pp
The second optimization is that certain operations
are implemented directly by the nugget's semaphores rather
than by the general mechanisms described above.
The main criteria
that an operation must satisfy
to be classified as a semaphore operation
are that the operation:
(1) is invoked only using %send,
(2) has no parameters or return value,
(3) is serviced by input (or %receive) statements
in which it is the only operation and in which there
are no synchronization or scheduling expressions,
and (4) is declared at the resource level.
Note that these criteria are relatively simple to check.
Furthermore,
they capture typical uses of operations
that provide intra-resource synchronization
such as controlling access to shared variables.
.sh 3 "Failure Handling Mechanisms"
.pp
The address of a process's %proc handler is recorded by the RTS when
the process is created.
If the %proc does not contain an explicit handler,
the RTS instead records the address of
a special %abort handler.
The address of a process's current invocation handler is maintained by
the MC in a location known to the RTS.
If an invocation statement does not contain an explicit handler,
the MC instead sets the invocation handler to
what is recorded as the process's %proc handler.
Thus,
from the RTS's view,
there is always one handler associated with
each process's %proc and invocation failures.
When the RTS detects an exception
or is informed by the MC that an exception has been raised
(i.e., an %abort statement was executed),
it transfers control to the appropriate handler's code.
The special %abort handler will cause
the RTS to abort the current process and to pass the failure
up the call chain if the %proc was called;
the above actions are then applied recursively.
Aborting a process also causes a failure to be passed to
the callers of any invocations currently being serviced
by %in statements
or of any invocations pending for operations declared within the %proc.
.pp
When a %when statement is executed,
the MC creates an invocation block for the specified operation and arguments.
It then passes to the RTS the block's address
and the identity of the object to monitor,
i.e., the argument to #failed.
The RTS
records this information and initiates monitoring of the object.
Monitoring of physical and virtual machines is
accomplished by means of ``heartbeat'' messages that each machine
periodically broadcasts to the others.
When a process or resource instance is to be monitored,
the local RTS sends a message to the remote VM
informing that VM's RTS to monitor the object;
the remote RTS will send a ``failed'' message to the local RTS
should the object fail or terminate.
(The local and remote RTS could of course be the same.)
The local RTS also periodically checks the states of the virtual and physical
machines on which the process or resource instance resides to ensure
they have not failed or terminated.
Should the RTS detect that any object it is monitoring has failed,
it performs its usual actions to invoke the user-specified operation.
.sh 2 "Status, Plans, and Statistics"
.pp
An initial implementation
of a large subset of SR became operational
under 4.3BSD Vax UNIX in November 1985.
Since then, the language has been further refined
and the implementation has been modified,
extended, and ported to Sun workstations and an Encore Multimax.
Currently the full language has been implemented
except for failure handlers and a few minor features.
The current implementation also includes
facilities to invoke C functions as operations,
thereby gaining access to underlying UNIX system calls.
The UNIX versions of the implementation
will be completed by late 1987,
at which time they will be made available to interested groups.
Work is underway on a version of the implementation
that will allow SR programs to run stand-alone.
.pp
Our implementation was first used
in graduate classes in concurrent programming
in which students
wrote moderate-sized (500-5000 lines)
distributed programs
including card games, automatic teller machines,
simple airline reservation systems,
and prototypes of
a command interpreter and a file system for a distributed operating system.
Subsequently,
SR has been used to program
experiments relating program structure and
process interaction patterns [Atki87a],
a highly parallel interpreter for Prolog,
and Saguaro's file system [Purd87].
.pp
Almost all of the SR compiler, linker, and RTS
is written in machine-independent C.
The only exception is that the
RTS nugget contains some assembly language code for
process management (e.g., stack setup and context switches).
A few #awk and #sed tools are
used to simplify maintenance of the compiler
and #lex is used to generate the lexical analyzers
for the compiler and linker.
The compiler consists of approximately
17,000 lines of C source code.
The linker is about 1100 lines of C.
The RTS and nugget together contain about 4200 lines of C,
including code for the I/O and network interfaces to UNIX;
in addition, the nugget contains about 100 lines of assembly code.
.pp
The SR compiler processes 3200
lines per CPU-minute on a Vax 8650.
To give some comparison,
the C compiler processes about 22,000 lines per CPU-minute.
Thus, the SR compiler is about 7 times as slow.
This is not surprising since SR is a higher-level
language than C and the SR compiler generates C code that has
to be processed by the C pre-processor, C compiler,
and Vax assembler.
When machine-code generation is turned off, the SR compiler
processes 17,600 lines per CPU-minute.
Stated differently, 18% of compilation time is
spent in the SR compiler itself, 82% is spent
processing the generated C code.
.pp
At run-time, the Vax RTS (including the nugget) requires about
24K for text and 5K for static data.
Entries for resource instances, operations, and processes
are dynamically allocated.
Included in each load module
are an additional 15K of text
and 6K of data for the UNIX I/O and network library routines.
.pp
The time to process an invocation depends on whether it
is generated by a %call or a %send
and whether it is serviced by a %proc or an %in.
We obtained timing data for six simple SR programs:
one for each of the four combinations of invocation and service,
one that optimizes calling a %proc,
and one that uses semaphores.
(All operations in the test programs are parameter-less.)
For comparison purposes, we also obtained timing data for a simple C program.
Each test program generates and services 100,000 invocations.
The times, in microseconds, to process single invocations are listed below.
These times were obtained using the #time command
on a Vax 8600.
They are averages of ten executions of each test program
and include loop overhead so actual invocation processing
time is somewhat less than shown.
.(b
.ta +.7i +3.5iR +1.1iR
#Program	#Description	\fI\(*msec/inv\fP	#Relative #to #C
.sp .25
#C	C equivalent of #A	5	1
#A	%call to %proc; procedure call	8	2
#A1	%call to %proc; new process	511	102
#A2	%send to %proc; new process	470	94
#B	%call to %receive; 2 processes	390	78
#B1	%send to %receive; 1 process	160	32
#B2	%send to %receive; semaphore	13	3
.)b
The ``#Relative #to #C'' column shows the ratio
of the time per invocation for the given program
to that for the C program #C;
e.g., #A is about 2 times slower than #C.
.pp
Each SR test program consists of a single resource.
However, except for #A, the above times would be the
same if the invoker and server were in different
resources provided they were in the same VM.
A synopsis of the different test programs follows.
.ip "#C:" 5
C test program that simply invokes an empty procedure 100,000 times.
.ip "#A:" 5
#A is the SR equivalent of #C:
it calls an empty %proc in the same resource,
which the compiler optimizes as described in Sec. 4.2.3.
The overhead relative to #C results from the need to
support the general case in which the %proc might also
be invoked from outside the resource.
.ip "#A1:" 5
#A1 generates %call invocations of an operation serviced
by a %proc that executes a %reply.
Thus, a new process is created to service each invocation
and a context switch is required.
.ip "#A2:" 5
#A2 generates %send invocations to an operation serviced by a %proc.
It is like #A1 but there is no context switch overhead.
.ip "#B:" 5
#B contains two processes.
One generates %call invocations,
the other consumes them using %receive.
The processes must alternate execution for each invocation.
Context switching is the dominant cost in this program.
.ip "#B1:" 5
#B1 contains a single process that sends messages to itself.
This shows the cost of using %send and simple %in statements
with no context switches.
.ip "#B2:" 5
#B2 is like #B1 except the operation is implemented as a semaphore.
.lp
The overhead in the four most costly SR tests results from
the allocation of invocation blocks,
procedure calls to enter the RTS and within the RTS,
maintenance of RTS structures, and in some cases
process creation and context switches.
The major cost for the semaphore test program
is the procedure calls required to get to the nugget's P and V primitives.
.pp
The above measurements
give some idea of the absolute and relative costs of
different combinations of invocation and service.
(A more thorough discussion of SR's performance
appears in [Atki87b].)
The comparisons of the SR programs with the C program
are somewhat unfair, however, because SR is quite a different language.
For example,
SR has mechanisms, such as %send and %in, that have no counterparts in C;
SR also has dynamic resources and processes,
which require a more complicated RTS.
On the other hand,
it is desirable that SR programs such as #A
that use only C-like mechanisms
should not be too much slower than their C counterparts,
and this is the case.
Work is currently underway to improve both the MC
and the RTS to further speed up the implementation.
