Introduction to Z80 Emulator / Cross Developer V1.03 Written by Phil Brown, Copyright 1992 This program formed the basis of my final year individual project for my Masters degree in Software Engineering at Imperial College, UK. The work contained within it is entirely my own so I take full responsibility for it and any credit, if it is due. The program is not being written for financial reward and I ask for none. This program may be copied, transferred, posted and distributed freely. If you use this program, then it would be nice if you could let me know and tell me what you are using it for, as this may affect future developments, and I will derive pleasure from this. Please don't send me money. I don't expect this program to make me a millionaire, and if there is someone out there who finds it useful then that is just reward for all the neat stuff with which others have provided me. At some later date (very much later), this program may evolve into a shareware product so if you feel the urge to shower me with gifts then hold on until I feel justified in accepting remuneration for the product, or spend it on your girlfriend/partner. I know which I would choose. Please note that this is a product under development. It is supplied with no guarantees. Use it at your own discretion. I will however gladly give support to this product. What you will need: - A Commodore Amiga with about 50K CHIP and 200K FAST memory available. - Preferably some experience or understanding of the Zilog Z80 CPU. I don't intend to describe the intricacies or workings of the Z80 here, so in the following documentation I'll use standard Z80 mnemonics without further explanation. Those who don't understand these will probably find this program fairly useless anyway. The philosophy behind this program: This program was written to investigate the levels at which a CPU can be emulated in software. More on this later. As such it comprises an emulation of all the internal registers of the Z80, a 64K memory map, a complete cross-assembler and disassembler and a simple interpretive emulation routine which should simulate all the effects of instructions on a Z80. Furthermore an "abstract machine" is provided which looks like a console screen and keyboard to the Z80, responding via IN and OUT ports, but which provides a "bridge" to the peripherals and devices of the host computer (the Amiga). It was not written to emulate any one particular system, although the highly modular nature of the code means that new abstract machines, methods of software emulation and even CPUs may be added very easily. The code I have written provides a core system around which I may add other things very easily for my research. However, as it stands it provides a fairly complete cross-development system for Z80-based targets. For those interested parties the program is written entirely in Modula-2. Untrendy it may be but my Benchmark compiler gets the job done in half the time of a C compiler. The program was developed initially on a 1.3 A500 with a 25MHz MegaMidgetRacer 68030 accelerator and latterly on a 2.0 A1500. I see no reason why it should not work on all systems as I don't do anything even faintly naughty. Previous users note: the program now read in the size of Workbench screen and uses the same parameters. Unfortunately the one exception to this is SuperHiRes screens (ie having a super hires workbench results in an ordinary HiRes Emulator). What you currently get: - A representation of the internal registers of the Zilog Z80 CPU. These include all the nasty things like the Half Carry register, IFF1 and IFF2 and the Parity/Overflow flag. - A simulated 64K address space. - A single pass assembler. - A symbolic disassembler. - Some debugging tools. - An emulated console. - An interpretive emulator. - An optimised emulation cycle for 25% more throughput at the expense of memory usage. - A translator from a program image in memory to standard 'C' source which can then be separately compiled and linked to a run-time support library for blistering throughput. This may not be present on all versions; I am currently extending and optimising this process. Please note that this emulator was not designed for speed and gives a throughput of about 4000 instr/sec on a 25MHz '030 system. Incidentally, the emulator is designed to be reasonably portable and all the Amiga-specific stuff is contained in one module. There's no reason why porting to another system shouldn't be very trivial. Central to the design of the system is the Abstract Instruction Format. When I started designing the program it became obvious that significant benefits in design and modularity could be released if I introduced a new representation of a Z80 instruction. Let me explain this a bit. Traditionally there are two representations for an instruction; the mnemonic string eg LD A,(IX+23) and the byte-code eg 253,126,23. Now the first stage of disassembling and emulation is the same, to work out what the heck these little strings of 1's and 0's really mean, the only difference being that when disassembling you display the mnemonic and when emulating you simulate the effects. Normally the code to display the instruction or emulate it is threaded in amongst the disassembly of the byte codes. This would result in two very similar routines, which is unsatisfactory from a design elegance point of view, and also makes the code less legible and more difficult to debug and maintain. Similar problems exist when considering assembly and interactive emulation. Say we want the user to be able to type in an instruction from a prompt and see the effects; normally it would be necessary to assemble this string into some place in memory and emulate it from there. What I decided to do was to introduce a structure which stored all the information held within an instruction in an easily accessible form. This was used to move between the different representations. For example, the first phase of disassembly and emulation is now the same; to decode the bytes into an abstract instruction, then this abstract structure is either displayed (via a simpler decoding routine) or emulated. The abstract instruction thus lies between the two traditional representations: Instruction Mnemonic LD A,(IX+23) | ^ | /|\ Parser | | Decoder \|/ | V | \ Abstract Instruction ----------> Emulator LOAD, / Direct Register A, Indirect Register pair IX displaced 23 | ^ | /|\ Assembler | | Disassembler \|/ | V | Byte Code 253,126,23 This structure gives the code great modularity and is also quite elegant. Thus disassembly to mnemonic form now disassembles from byte code to abstract format, and is decoded from that to the mnemonic string. This has a performance hit but is not too bad. Now the most useful bit: how to use the program. First the good news: most functions can be accessed from a typed command interface, or from a menu. Now the bad news: menus are not completely implemented. Any menu option which is interactive (such as a defining a filename to load or save) will give a strange error message (strange to you, not to me!), normally telling you that you have not defined all required parameters. Why is this? Well, my Modula-2 compiler had great difficult interfacing to the req.library that I wanted to use and in the end I had to give up, as even the assembly language interface to the compiler is bugged. Sorry about this. If a menu command doesn't work then use the command line. Menu items currently affected are: Load, Save, Disassemble, Dump, Initialise and Copy memory. All others should work OK. When the program is fired up (from the CLI, Workbench, Runbacked or just about any other way you fancy) you will be presented with a screen with the same characteristics as your Workbench, split into two sections. The first section, occupying the top seven lines of the screen show the current register contents. The second section is your interactive command section, and where most responses to your commands will happen. It recognises and uses your system fonts and menu fonts, resising windows appropriately. If your font is too large you may want to use the FONT command to change the default. If the output to the screen is whizzing by too fast, then the simplest way of doing this is to press the right mouse button (or pressing Right Amiga and Rigth Alt) which brings up the menu bar and halts output until one of these is released. I included at one time a scrollback buffer and proportional gadget, but for various aesthetic and technical reasons this had to be dropped. The program knows about a "current number base" which is the number base in which values will be displayed. This can be Binary, Octal, Decimal or Hexadecimal. On startup the system is in Decimal mode (sorry all you hex fans, it's what I grew up with on the Z80. It's strange; for some CPUs I think in hex, for others in decimal...maybe I'm just wierd). You can of course change this by simply selecting an item from the Settings menu or by typing a command, or by using a startup script. Before I start banging on about the individual functions, there's a couple of things I should mention. The command parser is pretty flexible. It's handled by a database which details for each command the parameters it expects, their type and whether they are optional or required. The commands may be accessed by typing as much of the command as required eg D is adequate for DISASSEMBLE, but at least DU is required for DUMP. Each parameter has a keyword associated with it (unless it is a flag) and this may be used to identify the following parameter. If the parameter keyword is missing then positional placement dictates which parameter is being defined. Hmm. I have not defined this too well methinks. Here's an example. The template for DISASSEMBLE is: DISASSEMBLE [END Address] [FILE FileName] [NOLABELS] What this means is that the first parameter is required and is the start address. The other three are optional and the last is a flag which acts as a switch. Of course any or all of the command may be in upper or lower case. If the user types: D 0 100 Prog.ASM nolabels Then the absence of keywords means that position dictates which is which so disassembly starts from 0, ends at 100, output is to the file Prog.ASM and symbolic disassembly is switched off. If, however, the user types: DISASSEMBLE FILE Prog.ASM NOLABELS END 100 START 0 then the parser will locate the parameters according to their keywords and the result will be the same as previously. The template may also indicate a switch list, such as [ ON | OFF ] which specifies that one of the switches may be defined. Furthermore, when a numeric parameter is required (such as an Address), the expression handler utilised in the assembler is used. This means that expressions containing symbols may be used. Standard precedence is observed. Therefore, assuming the labels are defined in the symbol table, the user may type the following: DISASSEMBLE MY_LABEL*10-20 END_CODE and the expressions will be evaluated and passed to the routine as expected. If the user wants to define a filename which contains a space, double quotes may be placed around the name. Also, any numbers specified as parameters will be assumed to be in the current default number base. If the user wishes to override this then the following prefixes may be used: % - Binary & - Octal # - Decimal $ - Hexadecimal eg %01111111, &177, #127, $7F all represent the same decimal value of 127. I'll now describe in detail what those commands mean and what they do. ASSEMBLE [FILE FileName] [START Address] Initiates assembly of the specified file to the specified address. If the file is not specified, or if the special token 'INLINE' is used then the input will be interactively from the keyboard, rather than from a file. If the start address is not specified then the current value of the PC is used. Of course, most programs will have an ORG directive as their first line anyway. When assembling interactively, a colon ":" prompt rather than the standard arrow ">" will be used. The user should use the standard directive END to terminate assembly. Each incarnation of the assembler clears the symbol table of all non-global labels (eg assembler and disassembler-generated labels). A word about the assembler. It follows the Zilog assembler specs pretty closely, but with the following exceptions. Zilog use appended letters to distinguish their numeric bases, and I use prepended characters as previously noted. The DEFW and DEFB directives should allow a series of comma-separated values. This doesn't fit into the parser nicely and until I have time to change it, DEFW and DEFB directives accept a single parameter only eg DEFW 100,13563 should be changed to DEFW 100 and DEFW 13563 on separate lines. Sorry about that. Also labels may only appear in the first column, although the trailing colon is optional. Who places labels in between operands anyway? Macros are currently not supported. The DEFL directive is not supported (use the EQU directive instead), the rather pointless DEFT directive is not supported (use DEFS instead). The only operators allowed in expressions are -,+,/,*. Most of these limitations will be lifted when I have more time. The assembler is single pass. When it comes across a forward reference, space is left in the assembled code and the reference is entered into a table. When assembly is complete this table is resolved to fill in the bytes with the proper values. Any references remaining unresolved indicates an error. These may be defined by the user and resolved using interactive commands (see later). The assembler is not emulated but runs as compiled 68000 code. Therefore the address space of the Z80 is not adulterated by having lots of utilities scattered around. A note on relative jumps. There appear to be two standards used in relative jumps; that defined by Zilog and that used by other people. Zilog state that a relative jump is offset from the first byte of the relative jump instruction. Others say that because the operand is modified by -2 to allow for the twice incremented PC after fetching the instruction and operand, the jump is offset from the byte of the immediately following instruction. In other words, in Zilog terms JR 0 is an infinite loop but others say JR 0 does nothing. I have adopted the Zilog usage as it is likely to be more widespread. DISASSEMBLE [END Address] [FILE FileName] [NOLABELS] Initiates diassembly of byte code to the equivalent mnemonics. The address from which to start disassembling is required. If the END address is not defined then the whole 64K map will be disassembled. The user may press Escape or select the Abort menu item to stop disassembly at any point. If the FILE parameter is defined then output will be directed to this file, otherwise output will go to the screen. By default the disassembly is symbolic; so any symbols present in the symbol table will be used to make the disassembly more meaningful. This can be disabled by using the NOLABELS switch. If the disassembler is run in symbolic mode, then any jump instruction will have its destination address calculated and a label for that address will be generated. If the address is before the current disassembly point then obviously the label will not previously have been displayed, so the address is displayed. If the destination address is forward, then the generated label is used. Subsequent runs of the disassembler will use the previously generated labels to produce more symbolic output. Disassembler labels are cleared each time the assembler is run, however the user may retain important labels by declaring them as GLOBAL. Because it is very difficult to distinguish data from code in a Z80 system, disassembling data will normally produce fairly sensible-looking results (sensible that is, until the code is examined). However, occasionally an illegal byte code sequence will be produced (this will only happen in the case of instructions extended by the hex CB, DD, ED and FD bytes. When an illegal sequence is detected, a DEFB directive of the first offending byte in the sequence is output, and disassembly continues from the next byte. However in certain cases if required to disassemble an illegal extended instruction for which there is an nonextended equivalent, the disassembler will not produce an error but will output the nonextended (legal) instruction. For example: bytes DD,60 has no legal (documented) instruction, but the disassembler will produce the mnemonic LD H,B which has byte code 60. When checking my assembler and disassembler I reassembled the disassembly of a 16K ROM and then disassembled it again. The disassembled output was identical but the assembled code was three bytes shorter than the original! This had me scratching my head for a while. To have the disassembler detecting these cases is an untidy piece of code on which I can't really justify spending time at the moment when it only crops up when you disassemble data (or possibly undocumented instructions - see later). CLEAR [ASSEMBLER | DISASSEMBLER | SYMBOLS | REFERENCES | PARSE] The entries in the symbol table have a source associated with them. This command allows the user to clear selected parts of the symbol table. If the parameter defined is SYMBOLS then the entire table is cleared. If the user defines the REFERENCES parameter then the table of unresolved references is cleared. If no parameter is supplied then both the symbol table and unresolved reference table are cleared. Defining either ASSEMBLER or DISASSEMBLER causes the appropriate symbols to be cleared. Defining PARSE clears the table of preparsed abstract instructions (see later). BASE [BINARY | OCTAL | DECIMAL | HEXADECIMAL] This command defines the default number base for the system. If a parameter is defined then this becomes the new number base and one immediately noticeable effect of this is that the register window is updated to reflect their values in the new base. If no parameter is supplied then a message is displayed telling you what the current base is. Alternatively the Settings menu contains an item to control the system number base. EVALUATE This provides a primitive calculator service. The user enters a number or expression as the parameter and the result is displayed in the four number bases. LOAD This command loads a byte image of the specified file to the specified emulated address space. If the file is longer then memory then it simply rolls-over and overwrites earlier portions of the file. SAVE Well surprise surprise this command saves a chunk of the emulated address space as a byte image. The last parameter specifies either the end address or the number of bytes to write. If the keyword is not present then LENGTH is assumed. COPY This copies a section of the emulated address space from the source to the destination addresses. SHOW [ASSEMBLER | DISASSEMBLER | SYMBOLS | REFERENCES | PARSE] This command displays the symbol table or unresolved references. If the user specifies ASSEMBLER or DISASSEMBLER as the parameter then labels which were generated by the appropriate module are displayed, if SYMBOLS is defined then the current GLOBAL/EXTERNAL symbols are displayed, if REFERENCES is defined then the unresolved reference table is displayed. If none is defined then the symbol table is displayed in its entirety. The number of entries in the symbol table is limited only by available memory. Defining the PARSE flag displays the parse tree of abstract instructions organised by the hashing function. At the end the number of entries and the maximum length of each hash table entry list is displayed. A maximum of 1 indicates a 100% hit rate onto hashed abstract instructions. INTERRUPTS [VALUE Value] [ON | OFF] This allows the user to control the system interrupts. If the VALUE parameter is supplied then this defines the number of Amiga clock interrupts to receive before emulating a Z80 interrupt. If the ON/OFF parameter is defined then this controls overall system interrupts. If it is OFF then the Z80 will never receive a clock interrupt. Note that this has nothing to do with the status of interrupts on the Z80 (the IFF1 flag set through EI and DI). This is useful if you want to ignore interrupts completely or are debugging a piece of code which involves interrupts, but you want to forget about them temporarily. If no parameter is defined then the current settings are displayed. Alternatively the interrupt status can be changed through one of the Settings menu items. Currently the only interrupts supported are clock interrupts, as other devices have not yet been implemented. The interrupt system should be identical to the Z80 system; allowing for all three interrupt modes and maskable and non-maskable interrupts, and priority interrupts. I don't take advantage of this flexibility and completeness, but it will come into its own later. EMULATE [8080 | Z80 | Zaphod] This command defines which system to emulate. The Intel 8080 is not yet supported (apart from the downwards compatability of the Z80). It is intended to add assembly and disassembly using the different opcodes of this processor at some later stage. Emulating a Z80 gives the user access to the CPU and memory, but no I/O facilities. Interrupts in this system respond by placing a value on the data bus (ie typically Interrupt Mode 0), which is normally an RST instruction which defines the restart address for interrupt handling. RUN [START Address] [TRACE] [SINGLESTEP] [PARSE] Initiates emulation at the specified address, or the current PC if none is defined. Note that no commands affect the register contents, so that if the user breaks out of emulation and changes some contents of memory, simply typing RUN will resume from where the emulation left off. If the TRACE parameter is defined then the register contents are updated after every instruction (and dramatically reducing performance in the process). The SINGLESTEP parameter defines that only a single instruction is to be emulated. Useful in combination with the hot key for examining programs at a pretty low level. The user may press Escape at any stage to abort the emulation. The PARSE flag indicates that the program should build up a table of instructions in a pre-parsed, more easily recognisable format. Using this flag results in a throughput increase of about 25%, but at the cost of potentially high memory usage. DEBUG [ON | OFF] This command is a switch which causes some other commands to produce more output. If debugging is on the the assembler displays each line as it is processed, together with the bytes to which it was assembled, the disassembler displays similar information (eg bytewise as well as mnemonic output), and the emulator will display the mnemonic of the instruction it is currently emulating in the Ex: box in the register display. Singlestepping through code with debugging enabled can be quite enlightening. If no parameter is defined then this command reports the current status, which may also be affected via one of the Settings menus. DUMP [END Address] [BINARY | OCTAL | DECIMAL | HEXADECIMAL] Back to something a bit more simple. A simple memory dump from the start address to the end address (if defined), or until the user presses Escape. INITIALISE [VALUE Value] Sets every byte between the lower and upper bounds to the specified value. If the VALUE is not defined then 0 is assumed. RESOLVE [REFERENCES] This explicitly tells the system to attempt to resolve any references that remain unresolved after assembly. Normally the user would define a couple labels or would assemble some other file with GLOBAL/EXTERNAL references to give values to these symbols. The parameter may be defined for syntactic sugar but is not required. The number of refernences that may be stored is limited only by available memory. ROM [START Address] [END Address] [ON | OFF] Declares the section of memory as Read Only. Any updates inside this range will have no effect. The ROM may be enabled or disabled by the optional third parameter. If the START or END parameters are undefined then the current values are used. Only one memory range can be defined as ROM for simplicity. SCRIPT Executes the script with the specified name. The script may contain any commands or instructions that can be used in the system. Therefore, you can have a script which does a whole bunch of things to customise the environment to your personal preferences. On startup, the system looks for scripts with the name Startup-Z80 in the current directory and in S:. The scripts may even call other scripts if you want, get them to assemble files or anything else you can normally do. There is currently a built-in limit of 8 nested scripts. COLOURS This command allows the user to define the colours to be used by the emulator. Initially the program will use the Workbench colours. The easist way to define the colours is using hex notation eg. COL $000 $FFF $F00 $00F FONT [FONTNAME FontName] [SIZE FontSize] This program defines the font to be used by the program. Chances are that your verions only supports the topaz 80 font. ABOUT Not particularly useful, just declares the author and version number. HELP [MNEMONIC Mnemonic] If used without a parameter then the database of command structures is analysed and the result displayed on the screen. The assembler needs to store a database of valid commands for use when syntax checking, and this may be interrogated to display valid operands for an instruction mnemonic. If you are unsure of the operand format for an instruction, or need to know which instructions are supported (eg undocumented instructions) then this database will tell you as it is the same as used by the syntax checker and assembler. Typing HELP LD will give you all supported formats for the LD instruction. In the list a 'r' denotes a single register, an 'n' denotes a single byte operand, 'nn' a word and 'd' a displacement byte. QUIT Exits the program. In addition the user may type any valid instruction at the prompt which will be immediately emulated. The instructions will have all the correct effects, but be aware that doing something like JP 1000 will do nothing more than set the PC to 1000 (ie the routine at 1000 will not continue to be processed). Of course following this up by singlestepping will achieve this. More about that Zaphod thing: Having a Z80 is all very well but it's usually pretty nice to have a machine to do something with. The modular approach to the system means that new machine emulations can be added very quickly and easily, once their specs are known. If you have a pet system which you'd like to see incorporated into the program then you need to do two things: 1. Provide me with the specifications of the system, including such things as to which INput/OUTput ports it responds (which port, which value), interrupt behaviour, memory mapping and anything else relevant, and 2. wait a while (not too long I hope but I can't guarantee anything). I will try to incorporate requirements as best I can. The Zaphod-1 emulation is a pretty simple emulation as it currently stands, mainly because I have done very little work on it. When I devote more time to it, it will gradually migrate towards responding to the complete VT52 escape sequences, and will allow a greater range of device interaction, such as disk I/O. If you request the Zaphod-1 mode, then a new medium-res, two-colour, 24x80 screen will be opened in the background. When emulation is running, this console screen will be brought to the foreground, and when emulation is complete it will retire quietly into the background once more. This screen has invisible gadgets in the top right corner. The user may interact with the console using standard IN and OUT instructions. An IN from port 0 will respond with the ASCII code of the last key pressed, or 0 if none available. An OUT to port 1 will display the appropriate character on the console screen. The following escape sequences are currently supported: ESC 0 - clears the console screen ESC 1 x y - moves the cursor position to x,y ESC 2 - clears to end of line from the current position A quick code stub illustrates how to talk to Zaphod: ld c,1 ; Port 1 for output to console screen ld a,27 ; ESC 0 clears the screen out (C),a ld a,0 out (c),a ld c,0 ; Port 0 for input from the keyboard loop: in a,(c) ; IN with the C Indirect sets the flags jr z,loop ; make sure we have some input out (1),a ; output the result jr loop ; carry on which provides a simple teletype; whatever is typed at the keyboard will appear on the screen. At some later stage I may write an OS around Zaphod; more likely he will develop into an emulation of some other system. Undocumented Instructions: The basis for the emulator has been the Z80 Assembly Language Programming Manual written by Zilog. I have therefore not implemented undocumented instructions as, not surprisingly, I have no details on them. There is no reason why they shouldn't be implemented, however I will need the complete details first ie which flags are affected, any side effects. I am aware of the following undocumented instructions, but would appreciate any further details on them or other instructions which are out there: 1 - The byte Index register LD instructions. Any LD instruction which refers to the H or L registers may be prefixed by the standard index extension bytes DD and FD in which case instructions referring to H will have be replaced by the high byte of the appropriate index pair, and those for L by the low byte. Other instructions which follow this pattern are: ADC, ADD, AND, CP, DEC, INC, OR, SBC, SUB, XOR. If anyone can give me suitably acceptable terminology for these instructions (eg LD HX,B for load High byte of index into B) I may incorporate them. 2 - The badly implemented Shifts and Rotates. Analysis of the operand codes after the standard extend byte CB reveals the following pattern: 00-07 RLC r 08-0F RRC r 10-17 RL r 18-1F RR r 20-27 SLA r 28-2F SRA r 30-37 ????? 38-3F SRL r It would be logical to assume that the codes 30-37 should be SLL r instructions. I seem to remember from a very long time ago that Zilog didn't get this quite right and got one of the shifted bits wrong, or the Carry or something, and so didn't document the (quite predictable) effects of the instructions as they didn't fall into the pattern. Again, if I get some more details on these they may find their way into the emulator. In particular this program owes itself to: - Benchmark Modula 2 (Thanks Leon!), without which I would have had made very many type incompatabilities. - CygnusEd Professional, without which the development would have taken a lot longer. - Valeria and Pappagalla for keeping me amused from a distance. - Bonnington for keeping me company late at night (this is my cat). - Grolsch lager for lubricating my thought processes. - Apple and cream pastry slices for maintaining my blood sugar levels. - and of course the Commodore Amiga for being the best damn personal computer available, without which I would have had to develop at University, something not particularly high up on my list of enjoyable evening pastimes! Release history: V1.00ß - 6th March 1992. Initial release. V1.03 - 15th July 1992 First non-beta release. - Support for OS2.0 fonts - Uses Workbench screen size and colours by default - Works on NTSC systems! - Preparsing of instructions - Translation of memory image to 'C' Having completed University, I have no more access to Internet etc. I can however be reached at: 11 Oxford Court Ashley Road Epsom Surrey KT18 5BQ England I'll respond to any queries I receive, and source is probably available upon request. You'll need a Modula-2 compiler (I use Benchmark) to make it all work. Thanks for using the program, hope it's not too frustrating. Phil Brown.