
texture-mapping
------------------------------
  
  first of all, texture-mapping is considered low-level. this
  means it is implemented for maximum performance, at the cost of
  accuracy. no interpolation takes place yet, but it is very fast.
  it's mainly intended for realtime applications.
  
  for maximum performance, your source array (the texture buffer)
  should be 2^n pixels wide, with n a positive integer greater 1.
  if this condition is not met, performance drops significantly on
  processors prior to the 68060, since an integer multiplication
  for each pixel cannot be avoided in this case.
  
  inside render.library, texture-mapping is implemented via
  scaling-engines. the term 'scaling-engine' might be confusing at
  first sight. it's used for a generalized concept of low-level
  conversion units that do some kind of stream conversion from an
  input to an output buffer. scaling-engines may be passed to the
  following functions: ScaleA(), RenderA(), ConvertChunkyA(). more
  functions are yet to come. a plain conversion is performed via
  ScaleA().
  
  there are three kinds of data required for texture-mapping: a
  source buffer, a destination buffer, and an array of destination
  coordinates. the destination coordinates form a trapezoid inside
  the destination buffer.
  
  source buffer (S)
  destination buffer (D)
  destination trapezoid (T)


                                          a'
                                        /\  
                                       /  \
   a               b                  /    \
     _____________            _______/______\__________
    |             |          |      /        \         |
    |             |          |     /          \        |
    |      S      |          |    /            \   D   |
    |             |          |   /              \      |
    |_____________|          | d'\               \     |
  d                 c        |    \        T      \    |
                             |     \               \   |
           |                 |      \               \  |
           |                 |       \               \ |
           |__________\      |        \               \|
                      /      |         \               \
                             |__________\______________|\
                                         \               \
                                       c' \_______________\ b'


  the pixels inside the source array (S) are mapped to the
  destination trapezoid (T). border clipping is fully implemented
  - your destination trapezoid may even reside outside the
  destination array. pixels are written only into that area where
  D and T overlap.
  
  since texture-mapping is considered low-level, render.library
  does not offer any 3d-routines for calculating the trapezoid's
  coordinates. your job is to do the brain-work, render.library
  only provides horsepower for the brute-force data transfers.



implementation
------------------------------

  set up a scaling-engine for texture-mapping:

    engine = CreateScaleEngine(
                     sourcewidth, sourceheight,
                     destwidth, destheight,
                     RND_DestCoordinates, &coords,
                     RND_PixelFormat, PIXFMT_CHUNKY_CLUT,
                     TAG_DONE );

  currently, render.library allows two types of data processed
  with scaling-engines: chunky bytes and truecolor longwords.
  note: texture-mapping with truecolor data is not significantly
  slower than with chunky pixels.

  
  do texture-mapping:

    Scale(engine, sourcebuffer, destbuffer, NULL);


  remove the scaling-engine:
   
    DeleteScaleEngine(engine);



specifying offsets
------------------------------

  ScaleA() allows additional tags for the total widths of the
  source and destination buffers.

    RND_SourceWidth
        total width of the source buffer [pixels].
        Default: the scaling-engine's source width.

    RND_DestWidth
        total width of the destination buffer [pixels].
        Default: the scaling-engine's destination width.

  Let's apply these additional tags to the diagram:

                                          a'
                                        /\  
                                       /  \
   a               b                  /    \
     _____________            _______/______\__________ ________
    |        |....|          |      /        \         |........|
    |        |....|          |     /          \        |........|
    |    S   |....|          |    /            \   D   |........|
    |        |....|          |   /              \      |........|
    |________|____|          | d'\               \     |........|
  d                 c        |    \        T      \    |........|
     <------>                |     \               \   |........|
        sourcewidth         |      \               \  |........|
                             |       \               \ |........|
     <----------->           |        \               \|........|
        RND_SourceWidth     |         \               \........|
                             |__________\______________|\_______|
                                         \               \
                                       c' \_______________\ b'

                              <----------------------->
                                  destwidth
                              
                              <-------------------------------->
                                  RND_DestWidth

   passed to CreateScaleEngine()
   passed to Scale(), Render(), ConvertChunky(), ...
  
  there are some details to be mentioned when offsets are
  specified.
  
  1. the 2^n code optimization applies to the total width of the
  source buffer. if you specify a sourcewidth of 256 pixels for
  CreateScaleEngine(), but RND_SourceWidth is 300, you won't
  profit from the optimized code.
  
  2. in the destination buffer, the right border is still clipped
  at the column defined throughout destwidth. there are no pixels
  written to the modulo area.
