INTERNALS for lcomp
(or how to write a driver for a new instruction set)

The driver for lcomp (actually, bb) is a lex program which matches the
instruction field of a compiler's assembly output.  General machine
code for most architectures would be very hard to write basic block
counting code for, as many things that one wants to be able to find are
hard to identify (for example, can one tell the difference
between a normal label and the start of a functions).  Therefore, the
driver for most machines will have to take advantage of the idiosyncratic
nature of compiler output.

The included drivers both are based on the output from the Portable C
Compiler (PCC).  68020.l was written for the Sun 3;  it has been tested
with SunOS 3.3 throught 3.5.  vax.l is for Digital Vaxen;  it has been
tested on BSD 4.3 and Ultrix 2.0.  I have not tried using either with
(and assume they will break on) the output of the GNU C compiler, but I
am sure it would not take long to port from a PCC version to a GNU
version.  It would be very difficult to port this code to work with a
compiler that directly generates object code with no provision for
creating assembler source.  Sorry.

My approach for writing a new driver is to start with one of the
existing drivers included in this package and modify it appropriately.
A good way to become familiar with one of these drivers is to examine
some compiler output for a vax or a sun and compare it with the output
of bb for that machine.  Reasonable knowledge of the machine you are
working on and it's function call/return protocol is useful if not
necessary for porting the driver.

The driver reads one file (a .s file generated by the compiler) and
generates two (a .s file with block counting code and a .sL file which
maps basic blocks to line numbers).  The generated code, when executed,
appends to a file called prof.out which contains a counter for each
basic block.  (The -r option to lcomp and lprint picks a name other
than prof.out.  Using an absolute path name can be extremely useful for
profiling a program that is run from a public bin directory, so that
users don't get a prof.out file every time they run the program, and
the developer of the program can get useful profiling data.)

The prof.out file is a series of entries of the form
	<.sL file> <n>
	<count for block 0>
	<count for block 1>
	.
	.
	.
	<count for block n-1>
One such entry is written for each file compiled with lcomp.

A .sL file has two parts.  The first part is a set of n (obtained from
prof.out) lines, one per basic block, each with four whitespace
separated records, containing the sourcefile name (these should all be
the same in any one .sL if one does not make too creative use of the C
preprocessor), first line of the basic block, last line of the block,
and number of instructions in the block.  This section is followed by a
line of the form "<m> functions", followed by m lines, one for each
function in the compilation unit (source file).  Each function line is
the name of the function and its first basic block.

The .l file must match instructions and generate code to do the
instruction counting (and link a file's count table with the routine that
prints counts), and write a .sL file.  Start with the vax or 68020 code
and modify it appropriately.  The following functions are predefined
(in bb.c) to handle common cases:

	passline()	-- pass a line through unchanged
	inst()		-- a normal instruction that does not use
			   the condition codes and does alter them
	safe()		-- an instruction that uses the condition codes
			   or does not change them
	branch()	-- a jump or branch (conditional or unconditional)
			   unconditional subroutine call does not go here
	stabd()		-- handle unix .stabd or .stabn lines
	stabs()		-- handle unix .stabs lines
	function(s, n)	-- declare that function named s starts at block n
	functionmap()	-- write the function map for the end of the .sL file

stabd() and stabs() read the .stab directives put out by the compiler for
debugger information.  They should work with both dbx and sdb style compiler
output.

The following functions must be provided by the driver:

	labelstartsblock(label)	-- return a non-zero value if the named label
				   could be the target of a branch.  if there
				   is a systematic way of telling that a label
				   only exists for a debugger, this function
				   should return false (zero) in that case.
				   this feature is used in the sun version
	increment()	-- increment a basic block counter.  output
			   at the beginning of a basic block.
	safeincrement()	-- same as increment(), but does not change condition
			   codes.  this is normally possible on most machines,
			   but can require tricky code.
	epilogue()	-- takes one argument, the name of a map file,
			   which must be linked with count information into
			   the counting table.  see bbexit.c.  the structures
			   are normally defined here.


There's more to it than what I have described here, but the best way to
figure it out is to try to write a driver for some machine.  I'll be
glad to answer questions on how to write a driver.  I will also be
willing to keep an archive of drivers that other people have written
for other machines, if there is interest.

paul haahr
princeton!haahr		haahr@princeton.edu
