




















                             A Programmer's Summary of

                               80386 CPU Enhancements



                                  Daniel A. Norton

                                CHERRY HILL SOFTWARE

                                  September, 1989











            Copyright 1989, Daniel A. Norton.
            All Rights Reserved.


            Permission  is   hereby  granted   for  any   individual  or
            corporation to  copy this  publication provided  that it  is
            copied in whole and not in part, and provided that no charge
            is placed  on its  duplication  beyond  cost  of  labor  and
            materials.   This whole  document includes the 16 pages from
            this cover page to page 16.
















80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved






            Intel is a trademark of Intel Corporation.

            Microsoft is a registered trademark of Microsoft Corporation









            First edition, September 10, 1989













































80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved


                                    INTRODUCTION

            This  summary   describes  the   differences   between   the
            instructions of the Intel 80286 and 80386 processors.  It is
            primarily intended  for programmers who are already familiar
            with the 80286 instruction set.

            A more  detailed  description  of  the  80386  processor  is
            presented in  the 80386 Programmer's Reference Manual, order
            number 230985-001,  published by  Intel Literature, 800/548-
            4725.












































Introduction                                                        Page 3 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved


                                REGISTER EXTENSIONS

            The 80386  CPU extends  the 80286  register set by expanding
            all of  the 16-bit  registers to 32 bits, with the exception
            of the  segment registers.  Two additional segment registers
            have been added, FS and GS.

            As an  example of  the extension  of the registers, EAX is a
            32-bit register  whose lower  16 bits  are the  familiar  AX
            register.   Although the  upper and lower bytes of AX can be
            accessed independently  through  AH  and  AL,  there  is  no
            support for accessing the upper 16 bits of EAX independently
            from the  lower 16  bits.   The other  32-bit registers  are
            named as  their corresponding 16-bit registers, but with the
            letter "E"  prefixed: EBX,  ECX, EDX,  ESI, EDI,  EBP,  ESP.
            Figure 1,  "32-bit EAX  Register  Layout,"  illustrates  the
            relationship between EAX, AX, AH, and AL.


               3                        1
               1                        6            8
	     +---------------------------+------------+-------------+
	     |				 |	      | 	    |
	     +---------------------------+------------+-------------+
              <------------------------ EAX ----------------------->
                                          <---------- AX ---------->
                                          <--- AH ---> <---  AL --->


                                      Figure 1
                             32-bit EAX Register Layout


            Unlike  the   other  segment   registers,  the  new  segment
            registers, FS  and GS,  have no  default selection  with any
            memory access instructions.  When referencing with FS or GS,
            a segment override must always be specified.


















Register Extensions                                                 Page 4 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved


                           32-BIT OPERANDS AND ADDRESSES

            Intel  added  the  32-bit  memory  and  register  references
            without increasing  the number of instruction op-codes to do
            so.    Instead,  the  CPU  has  an  80286-compatible  16-bit
            operation mode,  and a  new  32-bit  operation  mode.    The
            default is 16-bit mode, but this can be overridden in one of
            two ways:  1) By  prefixing the instruction with an operand-
            or address-size  override prefix,  or, in  protect-mode,  by
            specifying a 32-bit code segment.

            The operand  size may be 16 or 32 bits.  An instruction that
            normally refers  to AX  will refer  to  EAX  if  the  32-bit
            operand size  is specified.   Similarly, the address size is
            normally 16-bits,  but a  32-bit address is assumed if a 32-
            bit address size is specified.

            When running  in protect  mode, one of the attributes of the
            code  segment   descriptor  indicates  whether  or  not  the
            processor is  in 16-bit  or 32-bit  mode.  When operating in
            real-mode, addresses and operands always default to 16 bits.

            The current  default modes  can be overridden by an override
            prefix on  the instruction.   The  override only affects the
            instruction to which it is prefixed.  The CPU reverts to the
            default mode  on the  following instruction  (unless  it  is
            overridden again).   The  hexadecimal value  for the operand
            override is  66h.  The address override value is 67h.  These
            overrides may  be used  in combination  to override both the
            address size  and  operand  size  of  the  instruction  that
            follows the operand.

            For example,  the instruction  "MOV   AX,BX" in real-mode or
            in a  16-bit protect-mode segment is coded in hexadecimal as
            "8B C3."   In a  32-bit protect-mode segment, the same code,
            "8B C3," would  represent "MOV   EAX,EBX".  To generate "MOV
            EAX,EBX" from  real-mode or  from a 32-bit protect-mode code
            segment, the  opcode contains the operand size prefix and is
            coded as "66 8B C3."
















32-bit Operands and Addresses                                       Page 5 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved


                                EFFECTIVE ADDRESSES

            With the  80286, a memory operand could be referenced by its
            direct address,  with an  optional  base  using  BX  and  an
            optional index  using SI  or DI.  When referencing  a 32-bit
            address, the  80386 removes  the restriction  of specialized
            registers and  allows any  32-bit non-segment register to be
            used as a base and any 32-bit register except ESP to be used
            as an  index.   Furthermore, the value of the index register
            can be  multiplied by  2, 4,  or 8  before adding  into  the
            effective address.

            By allowing  the index  to be scaled in this way, references
            to arrays  whose entries  are each  2, 4,  or 8 bytes can be
            more quickly accessed.  With the 80286, such references must
            be preceeded  with an instruction which multiplies the index
            to obtain the correct byte-offset.

            Although this  new feature  can be  used to reference 16-bit
            and real-mode  segments, the programmer must insure that the
            upper 16-bits of the effective address are zero, otherwise a
            General Protection fault will occur (even in real mode).

            This example  of the  LEA  instruction  uses  all  addresing
            modes:

                      LEA       EAX,[ECX+4*EDX+5]

            This example  also illustrates  the new  power  of  the  LEA
            instruction as  provided by the 80386 processor to calculate
            certain first-degree  polynomial  expressions,  placing  the
            result in a register that is not part of the expression.























Effective Addresses                                                 Page 6 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved


                                       PAGING

            In  addition   to  the  protect-mode  segmentation  features
            available  with   the  80286,  the  80386  CPU  adds  paging
            features.   Paging can  only be  enabled in  conjuction with
            segmenting.  In other words, it can only be enabled when the
            processor is in protected mode.

            When  paging   is  disabled,   the  address   calculated  by
            segmentation calculations  is a physical memory address, and
            references  memory   directly.    When  paging  is  enabled,
            however, the address calculated by the segmentation logic is
            referred to  as a linear address (sometimes referred to as a
            "logical address").   The linear address is the input to the
            paging logic,  which converts  the  linear  address  into  a
            physical address.

            One advantage  of this extra address translation is to allow
            more than  one task to occupy the same "logical space."  For
            example, programs  that run  in virtual  8086 mode expect to
            see addresses  from 0x00000  to 0xFFFFF.  Normally, only one
            task could occupy this range.  With paging enabled, however,
            each task  will see  the same  logical space, but the paging
            logic will  convert these  addresses to  different  physical
            addresses.

            Another advantage  of this  address translation  is to allow
            what is called demand-paging.  A task's logical space can be
            very large  -- larger  than the  available physical  memory.
            Only some  of the  logical space will be present in physical
            memory.  The rest could be stored on disk.  The paging logic
            can be  programmed so that pages that are present are mapped
            to their  corresponding physical  memory addresses.    Pages
            that are not present are marked as such by the paging logic.
            When present  pages are  referenced by  the task,  reads and
            writes exchange  from memory  in the  normal way.  If a task
            references a  page that  is not present, however, the memory
            reference traps  the processor  so that the operating system
            can load the page from disk.

            This demand-paging  is similar  to segment  swapping, but is
            different in  that the  size of  all pages  are fixed  (4096
            bytes).   This fixed  size not  only  simplifies  the  logic
            required, but  allows for  not-present pages within a larger
            object.   For example,  if an  application requires  a large
            object, say  64k bytes,  the segment  swapping method  would
            require that all 64k of the object be present in memory when
            any part  of the  object is  loaded.   With paging, only the
            page that is referenced needs to be present.  Of course, one
            could create  segments with lengths limited to 4k bytes, but





Paging                                                              Page 7 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved

            managing a  single  object  over  several  segments  can  be
            extremely clumsy.






















































Paging                                                              Page 8 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved


                                  DEBUG REGISTERS

            The 80386  CPU expands  upon the  debugging capabilities  of
            "INT 3" and  the trap  flag, by  allowing traps  on specific
            types of  memory references.   This  allows the  CPU to  run
            normally until one of the specified breakpoint conditions is
            met.

            The debug  registers allow up to a total of four breakpoints
            to be  active at any one time.  A breakpoint is specified by
            loading  the   debug  registers  with  four  parameters  per
            breakpoint:

                 1) Reference Type:
                      - Instruction Execution
                      - Data Write
                      - Data Read or Write
                 2) Reference Address:
                      - 32-bit linear address
                 3) Reference Length:
                      - 1, 2 or 4 bytes

            Note  that   in  protect-mode   with  paging   enabled,  the
            breakpoint   address    checking   occurs    before   paging
            translation.   With paging  disabled and  in real  mode, the
            breakpoint address is a physical 32-bit address.

            The reference  length for  instruction execution breakpoints
            is always  one (1), regardless of the number of bytes in the
            instruction.  The reference address refers to the first byte
            of the instruction, including prefixes, if any.

            If the  reference length for data breakpoints is 2 or 4, the
            reference address  must lie  on a 2-byte or 4-byte boundary,
            respectively.  Any data reference within the specified range
            causes the trap.



















Debug Registers                                                     Page 9 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved


                                INSTRUCTION SUMMARY




              BSF                                      Bit Scan Forward



            The BSF  instruction searches  the specified  16- or  32-bit
            target for  a "1" bit, starting at the most low-ordered bit,
            and places  the  bit  index  in  the  specified  destination
            register.   The most  low-ordered bit  has an  index of zero
            (0); the  most high-ordered  bit has an index of 15 (for 16-
            bit targets)  or 31  (for 32-bit  targets).  In other words,
            this instruction  searches from  low to  high and counts the
            number of zero bits before the first non-zero bit.

            The search  begins at the most low-ordered bit, and proceeds
            up to  the most high-ordered bit.  If all of the bits in the
            target are  zero, the  zero flag  is cleared; otherwise, the
            zero flag  is set  and the destination register contains the
            number of zero bits encountered.

            EXAMPLES:
                 BSF  BX,usWord  ; Find the first "1" bit in "usWord"
                 BSF  EAX,EBX    ; Find the first "1" bit in EBX




























Instruction Summary                                                Page 10 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved



              BSR                                      Bit Scan Reverse



            The BSR  instruction searches  the specified  16- or  32-bit
            target for a "1" bit, starting at the most high-ordered bit,
            and places  the  bit  index  in  the  specified  destination
            register.   The most  low-ordered bit  has an  index of zero
            (0); the  most high-ordered  bit has an index of 15 (for 16-
            bit targets)  or 31  (for 32-bit  targets).  In other words,
            this instruction  searches from  high to  low and counts the
            number of zero bits before the first non-zero bit.

            The search begins at the most high-ordered bit, and proceeds
            down to the most low-ordered bit.  If all of the bits in the
            target are  zero, the  zero flag  is cleared; otherwise, the
            zero flag  is set  and the destination register contains the
            number of zero bits encountered.

            EXAMPLES:
                 BSR  BX,usWord  ; Find the first "1" bit in "usWord"
                 BSR  EAX,EBX    ; Find the first "1" bit in EBX






              BT                                               Test Bit



            The BT  instruction copies  the specified bit into the carry
            flag.   The bit  is specified  with two  operands.   The bit
            offset is  truncated to  the number of bits in the specified
            operand.

            EXAMPLES:
                 BT   AX,5           ; Test the 0020h bit
                 BT   usWord,BX      ; Test the 2^(BX MOD 16) bit
                 BT   EAX,17         ; Test the 00020000h bit
                 BT   ulDWord,EAX    ; Test the 2^(EAX MOD 32) bit
		 BT   AX,17	     ; Test the 0002h bit











Instruction Summary                                                Page 11 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved



              BTC                                        Complement Bit



            The BTC  instruction complements the specified bit.  The bit
            is specified with two operands.  The bit offset is truncated
            to the number of bits in the specified operand.

            EXAMPLES:
                 BTC  AX,5           ; Complement the 0020h bit
                 BTC  usWord,BX      ; Complement the 2^(BX MOD 16) bit
                 BTC  EAX,17         ; Complement the 00020000h bit
                 BTC  ulDWord,EAX    ; Complement the 2^(EAX MOD 32) bit
		 BTC  AX,17	     ; Complement the 0002h bit





              BTS                                               Set Bit



            The BTS instruction sets the specified bit to 1.  The bit is
            specified with two operands.  The bit offset is truncated to
            the number of bits in the specified operand.

            EXAMPLES:
                 BTS  AX,5           ; Set the 0020h bit
                 BTS  usWord,BX      ; Set the 2^(BX MOD 16) bit
                 BTS  EAX,17         ; Set the 00020000h bit
                 BTS  ulDWord,EAX    ; Set the 2^(EAX MOD 32) bit
		 BTS  AX,17	     ; Set the 0002h bit





















Instruction Summary                                                Page 12 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved



             BTR                                              Reset Bit



            The BT  instruction resets  the specified bit to 0.  The bit
            is specified with two operands.  The bit offset is truncated
            to the number of bits in the specified operand.

            EXAMPLES:
                 BTR  AX,5           ; Reset the 0020h bit
                 BTR  usWord,BX      ; Reset the 2^(BX MOD 16) bit
                 BTR  EAX,17         ; Reset the 00020000h bit
                 BTR  ulDWord,EAX    ; Reset the 2^(EAX MOD 32) bit
		 BTR  AX,17	     ; Reset the 0002h bit





              CDQ                            Sign Extend EAX to EDX:EAX



            The CDQ instruction extends the sign bit of EAX into all 32
            bits of EDX.  CDQ is the 32-/64-bit form of CWD.





             CWDE                                 Sign Extend AX to EAX



            The CWDE  instruction extends  the sign  bit of  AX into the
            upper 16 bits of EAX.  CWDE is the 32-bit form of CBW.


















Instruction Summary                                                Page 13 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved



              Jcc                                      Jump Conditional



            The Jcc  instructions have  been extended  on the  80386  to
            allow a  segment-relative (NEAR)  target address (previously
            the target was restricted to an 8-bit relative offset).  All
            conditional  jump   instructions,  except   JCXZ  have  this
            capability.

            Programming Tip: A "quirk" in the Microsoft assembler (MASM)
            defaults  all  forward  label  references  to  NEAR  if  the
            instructions  allows   it.     With  the   80826,  the   Jcc
            instructions did  not allow  NEAR offsets, and SHORT offsets
            were generated.   With  the 80386,  NEAR offsets are allowed
            and are the default, even for Jcc instructions.  To override
            this default,  specify the  SHORT modifier on forward target
            references unless you particularly require a NEAR reference.
            Otherwise, the Jcc instruction will use 4 bytes for each Jcc
            in a  16-bit code  segment (as opposed to 2) or 8 bytes in a
            32-bit code segment (as opposed to 2).





              LFS, LGS, LSS                           Load Full Pointer



            The LFS, LGS and LSS instructions are similar to the LDS and
            LES instructions,  except that  they alter the FS, GS and SS
            registers.

            EXAMPLE:
                 LSS  SP,pStack           ; Load a new stack


















Instruction Summary                                                Page 14 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved



              MOVSX                           Sign Extend into Register



            The MOVSX  instruction copies  the byte  or  word  from  the
            effective address  into the  destination register, extending
            the sign of the byte or word into the register.

            If the  destination is  a  16-bit  register,  the  effective
            address refers  to an 8-bit value that is sign extended.  If
            the destination  is a 32-bit register, the effective address
            may refer to an 8- or 16-bit value.

            Programming Tip:  CWDE may  be used in place of MOVSX EAX,AX
            and CBW may be used in place of MOVSX AX,AL.





              MOVZX                           Zero Extend into Register



            The MOVZX  instruction copies  the byte  or  word  from  the
            effective address  into the  destination register, and zero-
            extends the remaining bits in the destination register.

            If the  destination is  a  16-bit  register,  the  effective
            address refers  to an  8-bit value;  the upper 8 bits of the
            destination register  are zeroed.   If  the destination is a
            32-bit register, the effective address may refer to an 8- or
            16-bit value;  the upper  8 or  16 bits  of the  destination
            register are zeroed.

            Programming Tip: Instead of programming:

                 MOV       AL,BYTE PTR x
                 XOR       AH,AH

            use, instead:

                 MOVZX     AX,BYTE PTR x











Instruction Summary                                                Page 15 of 16






80386 CPU Enhancements                          Copyright 1989, Daniel A. Norton
                                                             All Rights Reserved



              SETcc                               Set Byte on Condition



            The  SETcc  instructions  store  a  byte  at  the  specified
            destination according  to the  specified condition.   If the
            condition is TRUE, a 1 is stored; if the condition is FALSE,
            a 0  is stored.   The condition codes for SETcc are the same
            as those for the conditional jump instructions.





              SHLD                          Shift Left Double Precision



            The SHLD  instruction shifts  the specified  16-  or  32-bit
            target operand  to the left by the specified number of bits.
            Bits are  shifted in  on the right from the specified source
            register, which remains unaltered.

            EXAMPLES:
                 SHLD usWord,AX,5   ; Shift "usWord" left 5 bits from AX
                 SHLD EBX,ECX,CL    ; Shift EBX left "CL" bits from ECX






             SHRD                          Shift Right Double Precision



            The SHRD  instruction shifts  the specified  16-  or  32-bit
            target operand to the right by the specified number of bits.
            Bits are  shifted in  on the  left from the specified source
            register, which remains unaltered.

            EXAMPLES:
                 SHRD usWord,AX,5   ; Shift "usWord" right 5 bits from AX
                 SHRD EBX,ECX,CL    ; Shift EBX right "CL" bits from ECX










Instruction Summary                                                Page 16 of 16
