Maintaining The Small Data Model
                           with C and Assembler

                     ================================

                           by Michael J. Monaco


First of all, what is a data model?  As discussed in this article, a data
model represents a method for addressing data storage elements within the
scope of a single program at the machine code level.  Or, to put it another
way, it is the method by which the executable machine code will address the
data stored in defined data section of the program.  Why is this important?
Depending on the data model chosen, the size or speed of a program will be
affected.  It is important to note that not all data storage used in a
program is subject to this model.  Generally, the data model is applied to
globally defined data storage.  It is not applied to dynamically allocated
memory, stack storage, or the cpu registers since these locations will
automaticallybe addressed in the most efficient method possible.

There are two types of data models to chose from, Large and Small.  In the
large data model, the location of a defined data element will be encoded in
the operand field of an instruction as a full 32 bit address. This address is
immediately usable without modification. The advantage is no practical limit
to the size of a programs data space.  The disadvantage is a larger program
and a slower program on the 68000 microprocessor. Larger because a machine
instruction with operands using the large data model can be from 6 to 10
bytes long. That's 2 bytes for the operation and 4 to 8 bytes for addressing.
(Fig 1a) Slower because of the two memory accesses required to fetch the full
32 bit address over a 16 bit wide data path. It also takes longer to load a
program with large data since the addresses are relocatable (not known till
execution time) and must be adjusted.

The small data model reverses the benefits/disadvantages of the large data
model. In the small data model, the location of a data element is encoded as
the contents of an address register plus a 16 bit offset given in the
operand. The address register is chosen by the assembler and it is loaded
with a calculated value at the beginning of program execution. Lets call it
the base register. The 16 bit offset has an effective range of 64K bytes and
is used such that the address generated is plus or minus 32K from the
location loaded in the address register. And only one base register is
provided and can not be changed during execution. This leads to a limited
data address space of 64K for predefined data. On the other hand the code
size of the program is small and it will execute faster.  A complete
instruction will be only 4 to 6 bytes long and will require only one memory
access to fetch the offset.  (Fig 1b) Since the offsets are based on a
register to be loaded at run time, there are no addresses that require
adjusting and the program will load faster.

So what's the big deal? Just use the Large model for large programs and the
Small model for small programs.  Normally that's the way its done, but
interrupt handlers require special consideration.  They are usually small in
size but by the nature of the way they are executed, using the Small data
model creates a problem.  The Large data model can be used at the expense of
size and speed, but that would be to easy.  In the accompanying article we
saw how interrupt handlers can be installed and how they can signal the
process into action.  Installing the handler really only consisted of adding
an initialized interrupt structure, with a pointer to the handler, into the
appropriate interrupt list.  When the interrupt gets executed, all the system
does is load the arguments to be passed in the appropriate registers and
calls the subroutine at the address specified in the interrupt structure.
The state of the registers at the time the interrupt is executed will not be
same as when the handler was installed.  Therefore, the handler cannot depend
on using the base address register for referencing data.  Any attempt to do
so will usually end up with a visit from the GURU.

Now the big question.  How can we use the Small data model and reference
globally defined data from within the interrupt handler?  Use another
addressing mode that allows the use of a register plus displacement such that
the value in the register will be guaranteed.  In particular, the addressing
mode called "Program Counter with Displacement".  The program counter is
guaranteed to contain the address of the next instruction to be executed, and
a 16 bit offset can be specified to reference data based on this fact.
Because of the way data and code are processed by the assembler, the data
referenced with PC addressing must be within the code segment of the program.
This means that the data that was globally defined must be moved to within
the code segment.  (Based on the Manx 3.6a assembler)

Lets look at some code from Flip v2.0 to illustrate the method.

    ULONG window_signal; /* global storage of an allocated window signal */

The above declaration will allocate 4 bytes of storage that will be visible
to all parts of the program. The C part of the program will store an
allocated signal here for use by the input handler.  Normally an assembler
code segment could reference this data item also, but its method of access
would be assembled with the same data model as the C code.

    window_signal = 1L << window_signal_bit;

Here we are storing some data.  The reference to window_signal at this point
will use the small data model. It will look something like this after compile
and assembly:

    move.l    d0,window_signal_offset_from(a4)

where a4 is the base address register.  Now that the data is stored, it is
still not accessible by the input handler.  To attempt to access
window_signal from the handler with the small data model would cause the
system to crash. This is because the register a4 is not valid at the time the
handler is called.  So what we have to do is move it someplace that can be
safely accessed.

    ;   /* it is good practice to start and end inline assembler code */
        /* with dummy C statements */
    #asm
      lea       _ws_,a0                 ; get the address of where we are
                                        ; moving window_signal to.
      move.l    _window_signal,(a0)     ; and move it.
    #endasm
    ;

With the Manx assembler we can't just say "move window_signal to _ws_"
because _ws_ is in the code segment of the program. Instead we get the
address of _ws_ and move it by indirect reference.  Notice that this move
statement does reference window_signal and the assembler will change it to
use the small data model.  It is ok at this point because we are in the
installation stage of the program and not in the handler code section.
Register a4 will be valid. Now where is _ws_?  Its in the code segment near
the handler routine between the functions.

    #asm
              cseg                ; storage must be accessible but not
              ds.l    0           ; restricted by the small data model.
      _ws_    dc.l    0           ; copy of window signal from startup

    #endasm

The above declaration will first align on a longword boundary and then
allocate 4 bytes of space.  Why don't we just move the window signal to _ws_
with the C code?  Its not possible with out some special manipulation of the
assembler source between the compile and assembler stages.  This is because
the 68000 instruction set does not support PC addressing as the destination
address.  OK, so now we have the data stored in the code section of the
program and now the input handler can access it using PC addressing.

    #asm
       move.l   _ws_(pc),d0       ; yes - user wants to flip windows
    #endasm

Yep, That's it.  Not quite what you expected?  Well, the important thing to
notice with this addressing mode is (1) the value in the program counter will
be the address of the next instruction to be executed and (2) the offset _ws_
is the number of bytes from the next instruction to the storage area "_ws_".
It is independent of the state of any address registers and therefore safe to
use in an interrupt handler.

You might be wondering why _ws_ had to be in the code segment.  Notice that
addressing with a base register (as with the small data model) and addressing
with the PC both use an offset.  The instructions themselves are basically of
the same type, "Address Register Indirect with Displacement", the distinction
is when the offset is calculated.  With PC addressing the offset is
calculated at assembly time.  The assembler sees _ws_ and knows the distance
between it and the next instruction. With a base register and the small data
model, data can be located in any module and the offset cannot be calculated
until link time when all the data is pooled into one data segment.


Happy Programming.