.ds LH Race Coverage
.ds RH Introduction
.bp 
.ce
.ps 16 
Introduction 
.ps 12
.XS
Introduction
.XE
.sp
.LP
.LP
Race coverage concerns itself with \fIthreads\fR.  You have threads
when two simultaneous executions of some program text can share data.
Examples:
.IP (1)
POSIX threads.  
.IP (2) 
Multiple processors executing the UNIX kernel at the same time.
.LP
Since the threads share data, they must protect it from simultaneous
access.  The usual approach is locking.  A lock is placed before each
access to a datum or group of data.  Only one thread at a time may
pass through the lock.  When the successful thread is finished with
the datum, it unlocks it, allowing one of any waiting threads to proceed.
(Other strategies for protecting data are possible; race
coverage applies to them as well, since it measures whether there was
an opportunity for simultaneous access.)
.LP
Incorrect data protection can be of various types:
.IP (1)
Failure to lock.  Here, simultaneous access can lead to corruption,
which is often quickly fatal.
.IP (2)
Failure to release a lock.  Since the data remains locked, threads
can wait forever.  The system often quickly comes to a halt.
.IP (3)
Deadlock.  Thread A has lock 1 and is waiting for thread B to release
lock 2.  Thread B is waiting for A to release lock 1.  Neither will
release their locks until the other does.
.LP
You can be more confident that a particular routine locks correctly if
there have been several cases where several threads were in it
simultaneously.  (For shorthand, we say that the routine is then
\fIracing\fR.)  Race coverage measures how often a routine races.
.LP
If incorrect locking is a risk for your system, and you have routines
that have never been raced, direct your testing effort at them.
A good approach is to find groups of functionally related unraced routines
(file manipulation routines, for example) and write stress tests for
those groups.  For example, you might write a test where many
processes write simultaneously to many files.  This will cause more
routine entries, thus more simultaneous routine entries, thus perhaps
more revealed locking problems.
.LP
Like all coverage, race coverage is an indirect measure of quality.
Very high race coverage would not force the detection of deadlocks
between the file system code and, say, networking code.  You could
achieve high race coverage by first stressing the file system, then
stressing networking, but you'd never find failures that result from
interactions between the two.
In practice, race coverage often does a surprisingly good job of
finding interaction bugs like deadlocks.  You can increase the chance
by running stress tests against a background load that moderately
exercises all of the system.
.LP
It might be reasonable to extend race coverage so that it checks
whether a thread has been in routine F while another thread was in
routine G.  Similarly, you might want to check whether a particular
loop's body has been raced, rather than whether the loop's containing
routine has.  These would entail more overhead, and they have not been
tried.  To date, race coverage has been good enough at pointing to
weaknesses in system stress tests.
.LP
The overhead of race coverage is low.  For example, when measuring the
race coverage of an operating system kernel, it is perfectly reasonable
to instrument "development systems" -- the kernels running on the
machines the developers use for editing and compiling.  Comparison of
the measurements for those systems to the measurements of
specially-stressed systems can be illuminating.
.sp 2
.LP
\fBAcknowledgement:\fR  The form of race coverage implemented by GCT
was suggested by Gary Whisenhunt.  He is not responsible for the name,
which is not a very good one.
