genser - generate serialising code

This is a pair of programs which generate code to serialise and
deserialise data structures, given a description of them.  I wrote them
for the protocol used by userfs, but they are quite general.  One
program generates code for encoding, decoding and finding the size of
the encoded representation, and the other generates prototypes for them,
and emits the types in C.

Feel free to use this code for your own projects, but remember this
is GPL code.

The input file is essentially C type definitions, with a few exceptions.
By default, code is generated for any type named with typedef, and any
anonymous type used in a typedef.

Arrays are defined as follows:

	typedef int foo[];

This defines a type "foo", which is an unbounded array of ints (32 bit
words).  This generates a structure of the form:

	struct {
		int *elems;
		unsigned long nelem;
	};

which is a pointer to the base of the array, and the number of elements.

This, and the complete exclusion of functions from the type system, are
the main differences from pure C syntax.  A number of parsing hacks have
been put in place so that C syntax can be parsed without semantic
content for genser.

Structures may be named with the "struct foo {...};" syntax, but they
are ignored until they are used in a named type.

Often you want to include a system include file for a couple of types,
but it defines dozens.  Typedefs can be marked as "generate on demand"
(when used in other types) by enclosing them in a notypedef block:

notypedef {
#include <sys/types.h>
}

The input file is run through cpp ("/lib/cpp -Ulinux -C").

It is possible to quote parts of the input file directly into the output
file, by enclosing them in "%%"

%%
/* Copy into output file */
%#include <sys/types.h>
%%

Lines starting with '%' in the quoted block have the '%' stripped off
before being copied.  This prevents the cpp pass when the input file is
parsed from expanding the include.

When decoding arrays of variable size and pointers to objects, the
decode routine calls a function or macro void *ALLOC(size_t size) to
allocate memory.  It expects this function will always return a valid
pointer to free memory.  By default, it is defined as malloc(), but it
can be redefined in the quoted section to something appropriate to local
conditions.  The memory allocated in the decode function must be
manually freed when you've finished.

There are a few bugs and things to do:

The code generated will always be correct, but it sometimes plays with
types such that the compiler will whine.  Input of the form:

typedef struct { int foo, bar; } baz;
typedef baz plap;

will cause this, since there will be two structures with the same shape
but different names exchanged interchangeably.  This can be resolved by
treating typedefs with more respect in the lexer/parser.

The file "coder.h" contains the definitions of functions to perform
operations on the base types.  All structures are decomposed into calls
to these functions.  Because this file is always necessary, the
generated files always include it.  However, there is no way to control
how it is included.

Often this is used in a RPC-type environment (like userfs), so in one
side there are many functions that never get called.  It would be better
if gencode could generate one .c file per function, and it can be
compiled into an archive library.

There should be a way of generating C++ code with destructors that will
do the appropriate freeing of memory.

Other bugs and comments to
	Jeremy Fitzhardinge <jeremy@sw.oz.au>

