Maintaining The Small Data Model with C and Assembler ================================ by Michael J. Monaco First of all, what is a data model? As discussed in this article, a data model represents a method for addressing data storage elements within the scope of a single program at the machine code level. Or, to put it another way, it is the method by which the executable machine code will address the data stored in defined data section of the program. Why is this important? Depending on the data model chosen, the size or speed of a program will be affected. It is important to note that not all data storage used in a program is subject to this model. Generally, the data model is applied to globally defined data storage. It is not applied to dynamically allocated memory, stack storage, or the cpu registers since these locations will automaticallybe addressed in the most efficient method possible. There are two types of data models to chose from, Large and Small. In the large data model, the location of a defined data element will be encoded in the operand field of an instruction as a full 32 bit address. This address is immediately usable without modification. The advantage is no practical limit to the size of a programs data space. The disadvantage is a larger program and a slower program on the 68000 microprocessor. Larger because a machine instruction with operands using the large data model can be from 6 to 10 bytes long. That's 2 bytes for the operation and 4 to 8 bytes for addressing. (Fig 1a) Slower because of the two memory accesses required to fetch the full 32 bit address over a 16 bit wide data path. It also takes longer to load a program with large data since the addresses are relocatable (not known till execution time) and must be adjusted. The small data model reverses the benefits/disadvantages of the large data model. In the small data model, the location of a data element is encoded as the contents of an address register plus a 16 bit offset given in the operand. The address register is chosen by the assembler and it is loaded with a calculated value at the beginning of program execution. Lets call it the base register. The 16 bit offset has an effective range of 64K bytes and is used such that the address generated is plus or minus 32K from the location loaded in the address register. And only one base register is provided and can not be changed during execution. This leads to a limited data address space of 64K for predefined data. On the other hand the code size of the program is small and it will execute faster. A complete instruction will be only 4 to 6 bytes long and will require only one memory access to fetch the offset. (Fig 1b) Since the offsets are based on a register to be loaded at run time, there are no addresses that require adjusting and the program will load faster. So what's the big deal? Just use the Large model for large programs and the Small model for small programs. Normally that's the way its done, but interrupt handlers require special consideration. They are usually small in size but by the nature of the way they are executed, using the Small data model creates a problem. The Large data model can be used at the expense of size and speed, but that would be to easy. In the accompanying article we saw how interrupt handlers can be installed and how they can signal the process into action. Installing the handler really only consisted of adding an initialized interrupt structure, with a pointer to the handler, into the appropriate interrupt list. When the interrupt gets executed, all the system does is load the arguments to be passed in the appropriate registers and calls the subroutine at the address specified in the interrupt structure. The state of the registers at the time the interrupt is executed will not be same as when the handler was installed. Therefore, the handler cannot depend on using the base address register for referencing data. Any attempt to do so will usually end up with a visit from the GURU. Now the big question. How can we use the Small data model and reference globally defined data from within the interrupt handler? Use another addressing mode that allows the use of a register plus displacement such that the value in the register will be guaranteed. In particular, the addressing mode called "Program Counter with Displacement". The program counter is guaranteed to contain the address of the next instruction to be executed, and a 16 bit offset can be specified to reference data based on this fact. Because of the way data and code are processed by the assembler, the data referenced with PC addressing must be within the code segment of the program. This means that the data that was globally defined must be moved to within the code segment. (Based on the Manx 3.6a assembler) Lets look at some code from Flip v2.0 to illustrate the method. ULONG window_signal; /* global storage of an allocated window signal */ The above declaration will allocate 4 bytes of storage that will be visible to all parts of the program. The C part of the program will store an allocated signal here for use by the input handler. Normally an assembler code segment could reference this data item also, but its method of access would be assembled with the same data model as the C code. window_signal = 1L << window_signal_bit; Here we are storing some data. The reference to window_signal at this point will use the small data model. It will look something like this after compile and assembly: move.l d0,window_signal_offset_from(a4) where a4 is the base address register. Now that the data is stored, it is still not accessible by the input handler. To attempt to access window_signal from the handler with the small data model would cause the system to crash. This is because the register a4 is not valid at the time the handler is called. So what we have to do is move it someplace that can be safely accessed. ; /* it is good practice to start and end inline assembler code */ /* with dummy C statements */ #asm lea _ws_,a0 ; get the address of where we are ; moving window_signal to. move.l _window_signal,(a0) ; and move it. #endasm ; With the Manx assembler we can't just say "move window_signal to _ws_" because _ws_ is in the code segment of the program. Instead we get the address of _ws_ and move it by indirect reference. Notice that this move statement does reference window_signal and the assembler will change it to use the small data model. It is ok at this point because we are in the installation stage of the program and not in the handler code section. Register a4 will be valid. Now where is _ws_? Its in the code segment near the handler routine between the functions. #asm cseg ; storage must be accessible but not ds.l 0 ; restricted by the small data model. _ws_ dc.l 0 ; copy of window signal from startup #endasm The above declaration will first align on a longword boundary and then allocate 4 bytes of space. Why don't we just move the window signal to _ws_ with the C code? Its not possible with out some special manipulation of the assembler source between the compile and assembler stages. This is because the 68000 instruction set does not support PC addressing as the destination address. OK, so now we have the data stored in the code section of the program and now the input handler can access it using PC addressing. #asm move.l _ws_(pc),d0 ; yes - user wants to flip windows #endasm Yep, That's it. Not quite what you expected? Well, the important thing to notice with this addressing mode is (1) the value in the program counter will be the address of the next instruction to be executed and (2) the offset _ws_ is the number of bytes from the next instruction to the storage area "_ws_". It is independent of the state of any address registers and therefore safe to use in an interrupt handler. You might be wondering why _ws_ had to be in the code segment. Notice that addressing with a base register (as with the small data model) and addressing with the PC both use an offset. The instructions themselves are basically of the same type, "Address Register Indirect with Displacement", the distinction is when the offset is calculated. With PC addressing the offset is calculated at assembly time. The assembler sees _ws_ and knows the distance between it and the next instruction. With a base register and the small data model, data can be located in any module and the offset cannot be calculated until link time when all the data is pooled into one data segment. Happy Programming.