About Using a Translation Cache for Emulation of the Z80 Richard Carlsson The problem with using a 'cache' (or 'threaded code') with pointers to the corresponding instruction routines is that of self-modifying code. A write to memory will set the corresponding word in the cache to zero, so the next time that instruction is executed it will be re-decoded (by the decoding routine at offset zero). Immediate data is not read from the cache, so no trouble there. That far, things are easy. The real problems begin with the prefixed instructions. Assume that the instruction consists of more than one byte. If a write to memory changes an instruction byte after the first, the actual instruction will change, but its pointer (the one corresponding to the first byte) is still the same. So, we conclude: in every prefixed instruction we must first test if one of its other instruction bytes has changed, and re-decode it if that is the case, marking the pointers after the first as "unchanged but needing decoding before they themselves can be used" (just use a nonzero pointer to a decoding routine). That should not be too difficult. But, wait... Prefixes are often used to modify the meaning of a more ordinary instruction. For example: 21 00 00 means Ld HL,0 DD 21 00 00 means Ld IX,0 So, if you jump to the address after the DD-prefix in "Ld IX,0", you get the instruction "Ld HL,0". The decoding procedure will put a nonzero pointer in the cache address corresponding to the opcode 21. Jumping to the DD-prefix address again will do "Ld IX,0" as it should. But if I first change the opcode 21 to 22, and then jump to the 22, the instruction is decoded as "Ld (0),HL" and there is a nonzero pointer corresponding to the 22. Now if I jump to the DD-prefix I want to get the instruction "Ld (0),IX", but the opcode pointer is nonzero and is not detected as having changed. Therefore "Ld IX,0" is executed as before. So normal instruction pointers can be permitted only at the first byte of an instruction. How do I detect an invalid pointer? I can only test the pointers for Sign and Zero without being too slow to be practical. Clr.w is the fastest way to mark a write to memory, so I'd like to keep the zero pointers for that purpose. Then I must split the pointer values in two: instruction pointers (positive), and 'unchanged' markers (negative). The test for prefixed instructions becomes: If any of the other instruction bytes have changed (has a nonnegative pointer), the instruction must be re-decoded. Sadly, this method halves the address space for valid pointers. The alternative is to use a zero pointer as 'unchanged' marker and a pointer to a decoding routine as 'address written to' marker. The test for prefixed instructions would be: If any of the other instruction bytes is a nonzero pointer then re-decode. It sounds simpler than the above, but will be a bit slower, since it uses "move.w #changed,dest". I will not use it unless I run out of address space. The decoding routines must also cope with the wrap-around at +7fff. Reading immediate data is not a problem, since the wrap-around is automatic there, and if no prefix or opcode byte is over the limit there is no problem either. If the PPC runs over the limit, the padding 'out of bounds' pointers will cause a direct wrap-around. But if there are prefix or opcode bytes on both sides of the limit, we do have a problem: When the decoding is finished and the instruction routine is jumped to, the PPC must actually be below the start of the cache - we need a buffer there as well. BEFORE AFTER buf ... buf <-PPC buf -8000 00 -7fff FE ... DD CB 00 FE = Set 7,(IX+0) PPC-> 7ffe DD 7fff CB pad pad ... pad The decoding routines cannot look ahead to separate 'goto execution of an instruction' and 'goto another step in decoding', so what pointer value should be written into the cache at PPC before we move the PPC? In the case above, the current value is pointing to the (currently running) DDCB-decode routine, which has detected the 'out of bounds' pointer. (The prefix decoding routines must test for 'out of bounds' pointers before writing the 'unchanged' markers.) If we leave the pointer as it is, any changes to the CB-prefix will not be detected next time the instruction is executed. The whole instruction must be completely re-decoded every time it is executed. Therefore we should make the pointer a 'contents changed' marker, guaranteeing a re-decode. In the case of a DDCB (or FDCB) prefix, the DD (FD) decoding routine has marked the pointer corresponding to CB as 'unchanged' before it made the jump, so that pointer must also be re-marked as 'changed'. After that, we move PPC to its corresponding place in the lower buffer and re-decode the complete instruction again, thus making sure that the contents of the buffer are valid for the prefix tests. The calculation of the real-PC assures that the instruction bytes will be taken from the correct address. When we do the 'modified opcode?' tests, we can only address offsets from PPC without being unreasonably slow. We don't get automatic wrap-around, and could end up testing an address off the upper limit of the cache. (But should not, since the decoding routines ought to detect the 'out of bounds' before instruction execution.) If that happens, the instruction will simply be re-decoded. Everything ultimately depends on the prefix decoding routines to handle the wrap-around. Obviously, an instruction with prefix or opcode bytes on both sides of the +7fff limit does not execute very quickly, being re-decoded twice each time. Hopefully, this will not happen often, and then not in any crucial inner loop.