

  Notes on x86 memory models
  --------------------------

  Contents:   x86 memory model mini-primer
              - A few x86 basics
              - DGROUP
              - Further reading
              Extract from TI #738
              - Borland C++ memory models
              Extracts from TASM User's Guide
              - The .MODEL directive
              - Predefined symbols
              - Segment descriptions




  x86 memory model mini-primer
  ----------------------------

  Intel 80x86 CPUs have a segmented architechture. From a vast
  number of segmentation options, C and C++ compiler writers
  have chosen a few 'memory models' and standardized these as
  part of a memory management scheme, to have programs access
  code and data in a uniform fashion.


  --- A few x86 basics
  An offset is an index into a segment. Code or data is 'far'
  when accessed using both a segment and an offset value,
  'near' when accessed using just an offset value. In 16-bit
  programs (all models except FLAT), a far pointer consists of
  two 16-bit values, a near pointer of a single 16-bit value.
  In 32-bit programs (model FLAT), a near pointer is a 32-bit
  offset value; no far 32-bit models exist.

  The stack is far if it resides in its own segment, near if
  it's included in DGROUP (DS = SS).

  Consequently, near code can be used if CS doesn't change and
  near data can be used if DS doesn't change, resulting in both
  a speed and a size advantage for 'near programs'. If the stack
  is near, then stack variables (passed and local) and DGROUP
  variables are directly reachable thru both DS- and SS-based
  registers.


  --- DGROUP
  Program data that isn't allocated on the stack nor on the
  heap goes into a data segment. To simplify: initialized near
  data goes into segment _DATA, and uninitialized near data
  into segment _BSS (near code belongs in _TEXT). At link time,
  the near data segments are grouped into DGROUP. To allow data
  access through near pointers, compilers generate code that
  expects DS to point to DGROUP.
  Among other duties, a program's startup code loads DGROUP's
  segment value into DS and, if the stack is near, includes the
  stack in DGROUP by reassigning SS and the stack pointer.

  Note that Borland's far data models use a far stack, and that
  Borland's huge model has no DGROUP.


  --- Further reading
  Topics on "memory models" and "memory management" in your
  compiler manual.

  Borland Technical Information Sheet #738 (TI738.ASC,
  "Memory Corruption", Nov 7, 1991), downloadable as part of
   <ftp://ftp.simtel.net/pub/simtelnet/msdos/turbo-c/bchelp10.zip>
   (or at ftp.borland.com, see below)
  a collection of TI sheets discussing topics related to
  Borland's C and C++ compilers.

  Borland Technical Information Sheet #1320 (TI1320.ASC,
  "Understanding MAP files generated by the Linker", Sep 21,
  1995), downloadable at
   <ftp://ftp.borland.com/pub/techinfo/techdocs/language/cpp/bcpp/ti/>
  discusses the naming, contents, and ordering of segments
  created by C and C++ startup code.

  Part 2 (section 3) of the FAQ regularly posted in newsgroup
   <news:comp.os.msdos.programmer>
  downloadable as
   <ftp://rtfm.mit.edu/pub/usenet/comp.os.msdos.programmer/dos-faq>
  discusses various issues on compiling and linking (among others,
  'How do I fix "automatic data segment exceeds 64K" or "stack plus
  data exceed 64K"?').

  (URLs valid May 1997)


  ---8<----------------------------------------------------------------

  (Extract from: Borland Technical Information Sheet #738,
   TI738.ASC, p. 3-4)

  PRODUCT  :  Borland C++
  VERSION  :  All
       OS  :  DOS
     DATE  :  November 7, 1991

    TITLE  :  Memory Corruption


  MEMORY MODELS:

  The important difference between the memory models are the size
  of the data and code pointers, number of data and code segments
  and the number and type of heaps available. For our purposes we
  will refer to the tiny, small and medium memory models as near
  memory models and the compact, large and huge as far memory
  models. We use this notation because the near memory models have
  both a near and far heap while the far memory models have only a
  far heap. The near memory models do not have a separate stack
  segment like the far memory models do. This is because the data
  segment, the near heap and the stack are all part of DGROUP
  meaning that the total size of these things must be less than or
  equal to 64K bytes. The tiny model is an exception to this in
  that it also includes the code segment and psp (256 bytes) in
  DGROUP also.

                               TINY  SML   MED   CMP  LRG  HUGE

       Near Heap             : Yes   Yes   Yes   No   No   No
       Far Heap              : Yes   Yes   Yes   Yes  Yes  Yes
       Code pointers         : near  near  far   near far  far
       Data pointers         : near  near  near  far  far  far
       Separate stack segment: No    No    No    Yes  Yes  Yes
       Multiple code segments: No    No    Yes   No   Yes  Yes
       Multiple data segments: No    No    No    No   No   Yes


  ---8<----------------------------------------------------------------

  (Non-verbatim extracts from:
  Turbo Assembler v5.0 User's Guide, p. 82-87, p. 227-229
  Copyright 1996 by Borland International, Inc.)


  The .MODEL directive
  --------------------

  Syntax: .MODEL [model_modifier] memory_model [code_segment_name]
                 [,[language_modifier] language ] [,model_modifier]


  Sets the memory model for simplified segmentation directives.

  Memory models set the size limits of the code and data areas for
  your program. They determine whether the assembler considers data or
  code references as NEAR or FAR addresses.

  You must use the .MODEL directive before using the simplified
  segment directives such as .CODE or .DATA. In stand-alone assembly
  programs, the .STARTUP directive must be used at the program's entry
  point to enable the register assumptions described below.

  Note that TASM, for MASM compatibility, defaults to assuming a near
  stack (as used by Microsoft C models) whereas Borland C far models
  use a far stack.


  Memory               Register
  model    Code Data   assumptions        Description
  ----------------------------------------------------------------------
  TINY     near near   CS=DGROUP          All code and data combined
                       DS=SS=DGROUP       into a single group called
                                          DGROUP. Used for .COM format
                                          executables. Some languages
                                          don't support this model.
  SMALL    near near   CS=_TEXT           Code is in a single segment.
                       DS=SS=DGROUP       All data is combined into a
                                          group called DGROUP. This is
                                          the most common model for
                                          stand-alone assembly programs.
  MEDIUM   far  near   CS=<name>_TEXT     Code uses multiple segments,
                       DS=SS=DGROUP       one per module. Data is in a
                                          group called DGROUP.
  COMPACT  near far    CS=_TEXT           Code is in a single segment.
                       DS=SS=DGROUP       All near data is in a group
                                          called DGROUP. Far pointers
                                          are used to reference data.
  LARGE    far  far    CS=<name>_TEXT     Code uses multple segments,
                       DS=SS=DGROUP       one per module.
                                          All near data is in a group
                                          called DGROUP. Far pointers
                                          are used to reference data.
  HUGE     far  far    CS=<name>_TEXT     Same as LARGE model, as far as
                       DS=SS=DGROUP       Turbo Assembler is concerned.
  TCHUGE   far  far    CS=<name>_TEXT     Same as LARGE model, but with
                       DS=nothing         different register assumptions.
                       SS=nothing
  TPASCAL  near far    CS=CODE            This is a model to support
                       DS=DATA            early versions of Turbo Pascal.
                       SS=nothing         It's not required for later
                                          versions.
  FLAT     near near  CS=_TEXT            This is the same as the SMALL
                      DS=SS=FLAT (DGROUP) model, but tailored for use
                                          under Win32 or OS/2.


  model_modifier
  --------------
  NEARSTACK        Indicates that the stack segment should be included
                   in DGROUP (if DGROUP is present), and SS should point
                   to DGROUP (default).
  FARSTACK         Specifies that the stack segment should never be
                   included in DGROUP, and SS should point to nothing.
  USE16            Specifies (if 80386+ code) that 16-bit segments
                   should be used for all segments in the selected model.
  USE32            Specifies (if 80386+ code) that 32-bit segments
                   should be used for all segments in the selected model
                   (default for 80386+).
  DOS, OS_DOS      Specifies that DOS is the platform for the
                   application (default).
  NT, OS_NT        Specifies that Windows NT is the platform for the
                   application.
  OS2, OS_OS2      Specifies that OS/2 is the platform for the
                   application.


  More than one model_modifier can be used. The model_modifier can be
  specified in two places, for MASM 5.2 compatibility.


  code_segment_name
  -----------------
  can be used in the large models to override the default name of the
  code segment. Normally, this is the module name with _TEXT appended
  to it.


  language and language_modifier
  ------------------------------
  together specify the default procedure calling conventions, and the
  default style of the prolog and epilog code present in each
  procedure. They also control how to publish symbols externally for
  the linker to use. Turbo Assembler will automatically generate the
  procedure entry and exit code (for procedures declared with PROC and
  ENDP) that is proper for procedures using any of the following
  interfacing conventions: PASCAL, C, CPP (C++), SYSCALL, STDCALL,
  BASIC, FORTRAN, PROLOG, and (the default) NOLANGUAGE (assembly
  language).

  Use language_modifier to specify additional prolog and epilog code
  when you write procedures for 16-bit Windows, or for the Borland
  Overlay loader. These options are: NORMAL, WINDOWS, ODDNEAR, and
  ODDFAR. The default is NORMAL.

  Also note that you can override the default language and language
  modifier when you define or call a procedure. You can additionally
  override the default language when you publish a symbol.




  Predefined symbols
  ------------------

  Using the .MODEL directive defines the following
     segments           _TEXT or <name>_TEXT,
                        _DATA   (<name>_DATA in model TCHUGE)
     group              DGROUP  (not defined in model TCHUGE)
     equates            @code, @data, @stack  (segment names
                        of CS, DS, SS assumptions), @Model,
                        @32Bit, @Interface, @CodeSize, @DataSize
     pointer types      CODEPTR, DATAPTR

  Far segments          (re-)define
        .CODE <name>    @code
    .FARDATA [<name>]   @fardata
   .FARDATA? [<name>]   @fardata?

  Using the built-in .STARTUP macro defines the near label @Startup.

  The equates @curseg and @WordSize are (re-)defined at any segment
  opening (don't require a .MODEL directive). @WordSize is also (re-)
  defined by a processor directive.

  


  Simplified segmentation segment description
  -------------------------------------------

  The following tables show the default segment attributes for
  memory models defined with the .MODEL directive:


  Models TINY, SMALL, MEDIUM, COMPACT, LARGE, HUGE

  Directive   Segment name   Align   Combine   Class       Group
  -----------------------------------------------------------------
  .CODE       _TEXT    (1)   Word    PUBLIC    'CODE'      (2)
  .FARDATA    FAR_DATA (3)   Para    private   'FAR_DATA'
  .FARDATA?   FAR_BSS  (3)   Para    private   'FAR_BSS'
  .DATA       _DATA          Word    PUBLIC    'DATA'      DGROUP
  .CONST      CONST          Word    PUBLIC    'CONST'     DGROUP
  .DATA?      _BSS           Word    PUBLIC    'BSS'       DGROUP
  .STACK (4)  STACK          Para    Stack     'STACK'     DGROUP

  Notes: (1) Segment name = <name>_TEXT for MEDIUM, LARGE, HUGE.
         (2) Code segment included in DGROUP for TINY model.
         (3) Models COMPACT, LARGE, HUGE only.
         (4) STACK not assumed to be in DGROUP if FARSTACK
             specified in the .MODEL statement.

  Note:  Microsoft's far data models include the near heap and the
         stack in DGROUP. Borland's compact and large models don't
         have a near heap and don't include the stack in DGROUP.
         Borland's huge model doesn't have a DGROUP.



  Model TCHUGE (Borland C++)

  Directive   Segment name   Align   Combine   Class       Group
  -----------------------------------------------------------------
  .CODE       <name>_TEXT    Word    PUBLIC    'CODE'
  .FARDATA    FAR_DATA       Para    private   'FAR_DATA'
  .FARDATA?   FAR_BSS        Para    private   'FAR_BSS'
  .DATA       <name>_DATA    Para    PUBLIC    'DATA'
  .STACK (1)  STACK          Para    Stack     'STACK'

  Note: (1) STACK is automatically FAR.



  Model FLAT (all segments are 32-bit)

  Directive   Segment name   Align   Combine   Class       Group
  -----------------------------------------------------------------
  .CODE       _TEXT          DWord   PUBLIC    'CODE'
  .DATA       _DATA          DWord   PUBLIC    'DATA'      FLAT (2)
  .CONST      CONST          DWord   PUBLIC    'CONST'     FLAT (2)
  .DATA?      _BSS           DWord   PUBLIC    'BSS'       FLAT (2)
  .STACK (1)  STACK          Para    Stack     'STACK'     FLAT (2)

  Notes: (1) Explicit stack is ignored for PE format executables.
         (2) Listing files and .obj dumps show that TASM defines
             a FLAT group, but puts data and stack in DGROUP.

  ---8<----------------------------------------------------------------
