About Using a Translation Cache for Emulation of the Z80

			Richard Carlsson


The problem with using a 'cache' (or 'threaded code') with pointers to
the corresponding instruction routines is that of self-modifying code.
  A write to memory will set the corresponding word in the cache to zero,
so the next time that instruction is executed it will be re-decoded (by
the decoding routine at offset zero). Immediate data is not read from the
cache, so no trouble there.
  That far, things are easy. The real problems begin with the prefixed
instructions. Assume that the instruction consists of more than one byte.
If a write to memory changes an instruction byte after the first, the
actual instruction will change, but its pointer (the one corresponding to
the first byte) is still the same. So, we conclude: in every prefixed
instruction we must first test if one of its other instruction bytes has
changed, and re-decode it if that is the case, marking the pointers after
the first as "unchanged but needing decoding before they themselves can
be used" (just use a nonzero pointer to a decoding routine). That should
not be too difficult.
  But, wait... Prefixes are often used to modify the meaning of a more
ordinary instruction. For example:

       21 00 00        means   Ld HL,0
       DD 21 00 00     means   Ld IX,0

So, if you jump to the address after the DD-prefix in "Ld IX,0", you
get the instruction "Ld HL,0". The decoding procedure will put a nonzero
pointer in the cache address corresponding to the opcode 21. Jumping to
the DD-prefix address again will do "Ld IX,0" as it should.
  But if I first change the opcode 21 to 22, and then jump to the 22, the
instruction is decoded as "Ld (0),HL" and there is a nonzero pointer
corresponding to the 22. Now if I jump to the DD-prefix I want to get the
instruction "Ld (0),IX", but the opcode pointer is nonzero and is not
detected as having changed. Therefore "Ld IX,0" is executed as before.

So normal instruction pointers can be permitted only at the first byte
of an instruction. How do I detect an invalid pointer? I can only test the
pointers for Sign and Zero without being too slow to be practical.
  Clr.w is the fastest way to mark a write to memory, so I'd like to keep
the zero pointers for that purpose. Then I must split the pointer values
in two: instruction pointers (positive), and 'unchanged' markers (negative).
The test for prefixed instructions becomes: If any of the other instruction
bytes have changed (has a nonnegative pointer), the instruction must be
re-decoded. Sadly, this method halves the address space for valid pointers.
  The alternative is to use a zero pointer as 'unchanged' marker and a
pointer to a decoding routine as 'address written to' marker. The test
for prefixed instructions would be: If any of the other instruction bytes
is a nonzero pointer then re-decode. It sounds simpler than the above, but
will be a bit slower, since it uses "move.w #changed,dest". I will not use
it unless I run out of address space.


The decoding routines must also cope with the wrap-around at +7fff. Reading
immediate data is not a problem, since the wrap-around is automatic there,
and if no prefix or opcode byte is over the limit there is no problem
either. If the PPC runs over the limit, the padding 'out of bounds' pointers
will cause a direct wrap-around. But if there are prefix or opcode bytes on
both sides of the limit, we do have a problem:
  When the decoding is finished and the instruction routine is jumped to,
the PPC must actually be below the start of the cache - we need a buffer
there as well.

       BEFORE		       AFTER

	       buf
	       ...
	       buf	     <-PPC
	       buf
	       -8000   00
	       -7fff   FE
	       ...		       DD CB 00 FE = Set 7,(IX+0)
       PPC->   7ffe    DD
	       7fff    CB
	       pad
	       pad
	       ...
	       pad

  The decoding routines cannot look ahead to separate 'goto execution of
an instruction' and 'goto another step in decoding', so what pointer value
should be written into the cache at PPC before we move the PPC? In the
case above, the current value is pointing to the (currently running)
DDCB-decode routine, which has detected the 'out of bounds' pointer. (The
prefix decoding routines must test for 'out of bounds' pointers before
writing the 'unchanged' markers.)
If we leave the pointer as it is, any changes to the CB-prefix will not be
detected next time the instruction is executed. The whole instruction must
be completely re-decoded every time it is executed. Therefore we should
make the pointer a 'contents changed' marker, guaranteeing a re-decode.
In the case of a DDCB (or FDCB) prefix, the DD (FD) decoding routine has
marked the pointer corresponding to CB as 'unchanged' before it made the
jump, so that pointer must also be re-marked as 'changed'.
  After that, we move PPC to its corresponding place in the lower buffer
and re-decode the complete instruction again, thus making sure that the
contents of the buffer are valid for the prefix tests. The calculation of
the real-PC assures that the instruction bytes will be taken from the
correct address.

When we do the 'modified opcode?' tests, we can only address offsets from
PPC without being unreasonably slow. We don't get automatic wrap-around,
and could end up testing an address off the upper limit of the cache.
(But should not, since the decoding routines ought to detect the 'out of
bounds' before instruction execution.) If that happens, the instruction
will simply be re-decoded. Everything ultimately depends on the prefix
decoding routines to handle the wrap-around.

Obviously, an instruction with prefix or opcode bytes on both sides of the
+7fff limit does not execute very quickly, being re-decoded twice each
time. Hopefully, this will not happen often, and then not in any crucial
inner loop.