Subject:  AgaEXTENDER.

Author:
 Fabio Bizzetti, via Fra' Giarratana 62/c, 93100 Caltanissetta, Italy
 fax/voice: +39 934 27220 / email: bizzetti@mbox.vol.it

(c) copyright 1996 by Fabio Bizzetti. All rights reserved.


The aim of this project is to improve drastically the performances of AGA
Amigas (possibly also OCS/ECS), extended to both future and old Amigas,
with the minimum efforts possible, both commercial and technological.

The Amiga is losing day after day the rest of its small market due to its
limited hardware, and although faster CPU's can be mounted, the video/audio
hardware cannot be improved in a cheap way to make it "popular".
Nowadays the competition is MultiMedia, and the Amiga needs a revolution,
but creating a new machine would still not resolv the problem, having millions
of already installed machines that cannot and must not become obsolete.
Both Graffiti and AGX don't help much, they only emulate a VGA's ModeX style
screen, that requires manipulation both in VGA and Graffiti/AGX Amiga, but in
this case we've a so poor bandwidth that makes all efforts at the end useless.

We're in front of a bad problem, the CPU->AGA bandwidth is very poor when it
comes to complex or "chunky graphics" based applications, but we can't release
an AGA+ for many reasons:
# It would cost too much at the moment, and would also require too much time
  to be developed, therefore it would probably not be that big improvement
  proportionally to the efforts to make it.
# All the previous A1200/A4000/CD32 would be cut off, or anyway I don't believe
  that many old users would mass-upgrade changing Lisa or the whole chipset.
# We've to keep the compatibility with older Amigas, this is indispensable,
  and is part of the Amiga "philosophy". The Amiga users consider the fact that
  most of the Amiga software run also on older Amigas, more than it happens
  in the PC world, as of vital importance, more than absolute performances.

But we *need* to drastically improve the situation, it's more serious than it
seems. My fears are that the Amiga loses all its already small commercial
market and become supported only by PD/ShareWare. It means an hobbyst computer,
and I like it a lot, but we also need high quality software (meaning hard work
behind it) that means commercial software.

Games and expecially MultiMedia/Productivity software are decisively important
to avoid the death of the Amiga and, more, make it again better than others.

Also mounting the fastest PowerPC card will not improve some serious lacks of
the audio/video architecture of the A1200, that doesn't deserve to become
obsolete when and if a new chipset will be released (perhaps not installable
in the old A1200s).

The solution exists, and it's optimal both technically and commercially, thus
Amiga Technologies should consider it carefully in my opinion.

A custom chip nowadays can be made, and if it's really worth it should be made.
Commodore made Akiko, and other interface chips, but no-one of them is
comparable to this in terms of real performances-gaining (about audio/video).
Nowadays technology allows the making of such a custom chip easily, although
it's more complex than Denise/Lisa, the technology of 1996 should surely allow
the making of the AgaEXTENDER.
Many custom chips produced today (on other platforms) are much more complex
than this one, that here is presented in an advanced version that could be
reduced as needed, in case resources don't allow a full implementation.

I consider myself an expert of the Amiga architecture, an appassionate of
hardware and a skilled and original coder. This is the project I designed:

The AgaEXTENDER is a device to plug-in the RGB port of old Amigas, and to be
integrated in the motherboard of future Amigas. It is based on a line-buffer
device, much cheaper than frame-buffer.

The whole AgaEXTENDER's work-cycle is based on a horizontal line, starting
from an Horizontal Synch and temporized via both the PixelClock output and the
28Mhz AGA clock (doubled internally to 56Mhz in case of PAL/NTSC Scan Doubling,
that the AgaEXTENDER implement to use VGA monitors also for PAL/NTSC screens).

A brief description of its features and performances:

# ChipMem->RGBport bandwidth of more than 22Mb/sec allowing i.e. such modes:
a) 24 bit (R,G,B byte based or 3 byteplanes) up to about 512*290 in PAL/DBLPAL
   or 512*580 PAL interlaced (with or without overscan).
b) 24 bit (A,R,G,B longword based) up to 384*290 in PAL/DBLPAL or 384*580 PAL
   interlaced (with or without overscan), becoming 768*580 with hardware
   antialiasing enabled (linear interpolation).
c) 15 bit (word based) resolution up to 1024*290 in PAL/DBLPAL or 512*580 PAL
   interlaced (with or without overscan).
d) YUV (8+8+8, byteplanes based) resolution up to about 512*290 in PAL/DBLPAL
   or 512*580 PAL interlaced (with or without overscan).
e) YUV (6+5+5, word based) resolution up to 1024*290 in PAL/DBLPAL or 1024*580
   PAL interlaced, or 512*580 DBLPAL (with or without overscan).
f) 8 bit (classic chunky mode) resolution up to 512*290 with 4 PlayFields.
f) 16 bit (YUV or RGB chunky mode) resolution up to 256*290 with 4 PlayFields.
g) 8 bit chunky mode for OS, resolution 768*600 31Khz 50Hz
Scale (Zoom) effects on the playfields. Full hardware smooth scroll support.
Many other video modes, completely programmable by skilled coders.
# Full OS support (draggable screens and AGAnormal/AGAextended together).
# Hardware completely programmable via 256 registers.
# HiRes-copper, for advanced effects/modes.
# Support for fast MPEG/JPEG display, due to built in YUV conversion and more.
# 16bit 3D audio, extremely high playback rate, 4+4+4+4 (or more) channels.
# Scan doubler. No bandwidth waste (unlike scan doubled DBLPAL/DBLNTSC).
# Extremely fast transparency effects.
# Antialiasing both horizontal and vertical.
# Fully programmable resolutions, independent for each playfield.


The device can be described as made of 3 principal parts:

 1ST STAGE: (FETCH)
 2ND STAGE: (AUDIO/VIDEO PROCESSING)
 3RD STAGE: (OUTPUT)

###############################################################################

 1ST STAGE: Fetch   (Watch diagrams-picture STAGE1_Fetch.IFF)


NOTE: the genlock audio bit freezes/allows the operations of this stage.

The function of this stage is basically to fetch more data possible from Lisa,
and to sort and store it into the internal line-buffers of the AgaEXTENDER.

To achieve full bandwidth usage, there are 3 samplers connected to the
analog RGB output. Resolution is 3+3+2 bits ( 3 red + 3 green + 2 blue ).
This choice is optimal because we could switch back to normal Amiga
screen at any moment, and not having time to load 256 color registers,
we get the best generic color palette limiting the resolution of one of
the 3 primary colors. Logic would suggest blue because it has the smallest
contribution to the brightness sensation, also if human vision has a good
colour discrimination capability in blue colors.

Thus we have this input from analog RGB, that will be directly converted back
into the 1..8 originary Lisa bitplanes serial data stream with no problems,
providing we set-up a special 256 palette in Lisa's color registers. This
allows us to have upto 8 independent extremely versatile DMA channels. There's
always the advice to use 32bit-wide and FastPage modes. The transfer rate is
selected by the AGA pixelclock period selected in BPLCON0 (140/70/35 ns).

For each of these 8 serial streams (coming directly from each AGA bitplane)
there's a shifter to delay of 0..63 bits the stream, then there's a register
that contains the bitlenght of the chunky cell from memory, and a bit-based
mask that filters eventual undesidered bits in the stream, in a continue cycle.
If active this circuit will waste some bandwidth, but it is very useful to
speed up CPU work when there's the need of fixed point math, i.e. in special
fading effects.

Examples:
1)  bitlenght=8, filtermask=%11111111
    byte based: no wasted bits.
2)  bitlenght=16, filtermask=%1111111100000000
    word based: upper 8 bits used, last 8 bits wasted for fixed point support.

NOTE: the mask is 64bit long, although only -bitlenght- least significant ones
are used.

So, although this means to waste ChipMem->RGB bandwidth, the global computer
gfx performances will be heavily improved when this possibility is useful to
speed up routines needing low bits for fixed point, giving to the AgaEXTENDER
raw data and thus freeing the CPU from heavy video processing work, that will
be performed internally in the AgaEXTENDER.

The implementation is simply a circuit that checks the mask register, and
provides or not a clock signal at its exit, to filter or not the bitdata.

At the exit of this bits filter, a 64bit register is checked to distribute
the stream into one 64bit cell of a line-buffer and thus allow a large number
of chunky modes and color resolutions.
NOTE: there're 4 line-buffers; the output will be destinated to one or unlikely
more of them (for special FX involving more playfields), and 4 audio-buffers
(that will be explained after).

EXAMPLE:

...distributor register...
<---------------------------64 bits---------------------------->
RRRRRRRRDDDDDDDDZZIIIPPPPPPPPPPPaaaaaaaaRRRRRRRRGGGGGGGGBBBBBBBB
0000000000000000000000000000000000000000111110001111100011111000

(R=register address, D=immediate data) for HiRes-copper
(Z=Zbuf,I=interpolation,P=Palette,a=alpha,R=red,G=green,B=blue)

NOTE: The R and D fields are special: they dont belong to the cell, they
are *not* part of the memory of line-buffers, but whenever all the 8+8 bits
are filled (in any line-buffer's cell of any line-buffer), the content
(8 bits for address and 8bit for immediate data) are used by the HiRes-copper
to execute 1 instruction (MOVE), simply copying the data in the addressed
register.
Due to the way the HiRes-copper's instruction register is thought, one or
more channels can be used to fill it, thus allowing more or less bandwidth
for the HiRes-copper, as needed by the application.

In the example of before, the circuit distributes the stream into 15bits mode.
Of course (in this case) bitlenght=16, mask=%0111111111111111 to filter the
first unused bit of the word.

This example allows a 5+5+5 bits RGB chunky mode, word based in memory (16bit).

With the same method and using 3 of the 8 channels, we can have 3 BytePlanes,
all internally stored in the same line-buffer.

NOTE: in the usual horizontal-line based cycle, at the start of the line
each channel will have a start position in the destination line-buffer, to
use more than one channel to fill the same line-buffer. Example:

BPLDATA0          BPLDATA1          BPLDATA2          BPLDATA3       (channels)
a                 b                 c                 d           (line-buffer)

In this case a,b,c,d are the starting points to begin to fill the line-buffer.
There're 2 more registers:
1) write N cells, skip M cells. (read below about anti-aliasing and VGA modes).
2) number of cells to process, then halt. (to avoid overwrite).


Of course, the distributor doesn't change any bit in the line-buffer that is
not explictly selected, thus allowing to load firstly the fields that are
constant (example: Z, I, and/or any other), or allowing multiloading of data
for even more complex special FX's.


Please note that although this stage may seem "complex", it is based on a
one-bit/pixelclock machine, and its excellent performances are the result
of its versatility, not of its complexity.


Q: So the AGA sprites bandwidth will be wasted?
A: No, because having the line-buffer we can use an extreme overscan (that
   kills AGA sprites) and not waste anything from sprite AGA bandwidth.

Q: Can I always have linear chunky mode?
A: Always, just using proper modulo/pointer settings if you want to use more
   than 1 DMA channel. You can have much more though: the AgaEXTENDER registers
   are totally programmable and allow "strange" modes that will impressively
   simplify the CPU work, and thus speed up the Amiga gfx a lot. There's also
   a very powerful as much as simple HiRes-copper.

Q: Can I have hardware scroll as in the AGA chipset?
A: Yes, always, also in the most strange and complex video modes.

Q: The programmable pixelrate and zoom functions make the device more complex?
A: No. They're simply fixedpoint registers added to an internal pointer in
   the 2nd STAGE that reads the line-buffers and sends the data to the screen.
   The clock is fixed, AGA's 28Mhz or x2 (56Mhz if Scan doubling is enabled).

Q: Why do we have the bit shifter/delayer in the first part?
A: Because thus we can scroll the playfield also in case we use it for a
   background and we wanna exploit the bandwidth as much as possible, also
   not using byte/word/longword based chunky mode but N bits based.
   This would not be good for drawing with the CPU, but for a fixed backgroud
   it allows 100% use of bandwidth and smooth scroll at the same time, for
   example with a 11bit chunky mode used to display a perfectly hardware
   scrollable 2048 colors dithered background. All with no CPU time waste.
   Ofcourse, multiples of 64bit scroll are made changing AGA bitplane pointers.
   i.e: in case of 8 bit chunky mode, we'll use 0/8/16/24/32/40/48/56 values..
   Furthermore, being the 1st STAGE a one-bit based machine, using a bit
   delayer instead of i.e. a byte-one, we simplify the project and gain from
   versatility at the same time.

Q: Why the 2 registers to write/skip consecutive data?
A: To emulate VGA's ModeX for SVGA emulators, to get twice or more apparent
   bandwidth in a playfield using linear interpolation, to make clever tricks
   that speed up gfx routines, and to allow virtually any number of audio
   voices.

Q: Can I have sprites?
A: Although the AgaEXTENDER would be too complex with hardware sprites, due to
   its extreme versatility and clever design, you can have sprites programming
   up to 4 PlayFields with cross-transparency/priority effects (and much more)
   using the built-in HiRes-copper. Thus very complex sprites can be emulated.

Q: Planar modes were very useful too, do I lose them?
A: I could say that you can switch back any moment to the Amiga hardware
   (worst case: in the next raster line) and thus have draggable screens
   that can be selected as Aga normal or Aga extended screen modes.
   But, since the AgaEXTENDER has been designed in a clever way to allow
   with or without tricks every screen mode immaginable, studying it you
   will see that it is perfectly possible (with proper register settings)
   to have all configurations of planar modes and/or mixed chunky+planar
   at the same time. There're virtually no limits. With proper registers
   settings you get a screen with 8-bit chunky mode and 8-bit alpha channel,
   that allows "fog/light" effects, transparecy and much more, like a cockpit
   in the bottom of the screen that doesn't need CPU time to be rendered,
   and if you want, this cockpit is in high resolution while the picture
   being draw was 160*128 in memory but looks 320*256 on the screen due
   to linear interpolation. The limits are of the coder, not of the
   versatility of the AgaEXTENDER. The device is thought to optimize the use
   of bandwidth and to minimize CPU intervention for video effects, still
   being an external relatively simple device connected to the RGB port.

Q: Aren't there "WAIT" and "SKIP" instructions in the HiRes-copper?
A: No, it's a very simple SIC (single instruction computer) and thus doesn't
   need the instruction opcode, but only the 2 operands (adress and data).
   Anyway, a "WAIT" instruction can be emulated using just "NOP" instructions
   (writing to the register address $FF). The lack of WAIT/SKIP instructions
   is not a disadvantage, since the stream of data from Lisa can't be
   stopped anyway. The HiRes-copper can "program" itself, redirecting part or
   the total of its bandwidth for other purposes, in a horizontal-line basis.

Q: Why the AgaEXTENDER speeds up MPEG decompression?
A: First, it allows free YUV conversion (any format).
   Second, it "decompress" in hardware each component using linear
   interpolation, so we can have i.e. a YUV screen that looks 320*256 1x1
   but is (in memory) 160*128 for the Y component and 80*64 for U,V all
   perfectly interpolated for best gfx quality and minimum CPU usage
   (multiloading data), requiring only 30K bytes/frame to get in output a 24
   bits image, while the CPU->chipmem bandwidth is more than 100K bytes/frame.
   This makes effectively the AGA chipset, with all its bandwidth problems,
   still much faster than SVGA chips for Video applications. We can still
   overlay not-interpolated gfx for text, at any different resolution. The
   combinations are infinite. Moreover, another kind of compression can be
   made: the "I" field can be used as information channel to display two
   consecutive pixels (in memory) as 2^(I+1) interpolated ones in the screen.
   This allows more video compression for free.

Q: Why the transparency support?
A: CrossFading (transparency among images) is a good effect for MultiMedia
   applications as Scala, and extremely impressive for videogames. Moreover,
   AgaEXTENDER's clever design allows also priorities among playfields, and
   the HiRes-copper can extend nearly infinitely the possibilities of the
   AgaEXTENDER.

Q: Isn't the line-buffer too complex (simultaneous access of upto 8 channels)?
A: With clever engineering it should be possible to simplify it a lot.
   An hint: Each of the 8-channels's cell from the distributor can be stored
   temporarily into a 64bit wide register, and once completed it can be stored
   into one 64bit cell of the line-buffer. The triple buffering means 3 times
   more static ram for the line-bufferings but it's simple ram, single access.
   The triple buffering is made just "rotating" (addressing) one of the 3
   images of the line-buffer for next line, one for currently displayed line
   and one for the previously displayed line for vertical anti-aliasing.
   Again, with incredibly complex devices such as the ones found into the PSX
   or Saturn, the AgaEXTENDER device, although "apparently complex", should be
   easy to implement in a cheap way, being at the end just a simple one-bit
   sequential (and pipelined) device.

Q: So you give away another part of the video bandwidth to audio?
A: No. You exploit the unused vertical blank lines to have the bandwidth
   required for advanced audio. Example: In a PAL screen there are 56 unused
   lines (312-256), if we use them for audio, we get upto 4 MegaBytes/sec.
   How? we can have 8bitplanes at a AGA resolution of 1472*56, meaning 82432
   bytes in a frame, meaning 4,121,600 bytes/sec for audio! And all this is
   "for free": using the video bandwidth *when* it would have been wasted
   exactly for nothing (vertical blanking).

#######################################

Audio Buffers part.

There're also four 64bit audio buffers, allowing upto 4 16bit stereo3D channels
(meaning 16 total 16bit voices). To allow a fast transferiment of data during
vertical blank, each of these buffers must have a lenght of at least i.e.:
(8*44100)/50fps=7056bytes in case of CD stereo quality and 3D stereo sound.
In case technology doesn't allow so much ram into the device, there's a
different solution based on horizontal lines for audio too, as for video,
and requires much less ram for audio buffers.

The registers work in the same way as for line-buffers, you've only to select
(for each of the 8 channels from AGA bitplanes) the destination as one of the
4 audio buffers instead of one of the 4 line-buffers.

Using the same tricks with AgaEXTENDER registers, you may load mono data, then
use the same data field for both left and right channels, front and rear,
(still having independent volume control for each of them, and thus allowing
real 3D sound with minimum CPU and memory usage) and/or i.e. you may load
8 bit datas, or 14 consecutive or whatever you prefer.

Read also the 2nd and 3rd STAGE part (about audio).


Q: What is 3D sound?
A: Imagine a 3D game like AlienBreed3D, where you are near a monster.
   If you calculate the amplitude of the sound from Front-Left,Front-Right,
   Rear-Left,Rear-Right directions, and set these values into the volume
   registers of the AgaEXTENDER, you'll have the 3D space surroud sensation
   and thus feel the 3D direction where the sound is coming from. All with
   minimum CPU and memory usage. 3D sound can be used not only for "space
   sound", but also for excellent stereo surround music/sound effects.

Q: How many channels do I get?
A: Maximum is four, but they can be stereo or mono, 3D or 2D, and upto 16 bits
   each.

Q: What is the maximum sample rate?
A: It can be *extremely* high, i.e. in only 20 lines during VBL you can get
   1536000 bytes/sec, enough for four 3D-stereo-16bit tracks at 48000 Hz!
   It could be also possible to implement hardware ADPCM decompression
   to get four times more these performances and save a lot of memory.

Q: Do I get variable sampling rate?
A: Yes, it's all independent and asynchronous. Moreover, the 3rd STAGE part
   (read after) can interpolate the audio data to limit aliasing distortion
   in case you use low sampling rates.

Q: Why, although they're stereo surround and 16bit, only 4 channels nowadays?
A: First, this is only an extension to the AGA chipset, so it should be kept
   simple to keep low the costs. This will not limit performances, because the
   design of the AgaEXTENDER hide many possibilities that the best coders will
   discover or invent. Example: we can get a much higher number of voices if
   we use the skip-registers and multiload data, that will be smoothed and
   thus apparently mixed in the 3rd STAGE (this can also be used for echo/
   riverber realtime CPU-free effects). Again, the only limits are of the
   coder, the hardware can allow "unlimited" number of voices using tricks.

###############################################################################


2nd STAGE: Video Elaboration   (Watch diagrams-picture STAGE2_Elaboration.IFF)


Due to clever design, the line-buffers can be normal ram, single access.
The horizontal interpolation doesn't need 2 simultaneous accesses, because
it's sequential and thus a register can contain the last fetched pixel.

The selectable YUV->RGB converter can be placed at the end of the 1st stage
or at the begin of the 2nd. Due to complexity, only one will be used (in the
first of the four line-buffers).

For 3D (display priority) effects is used the Z field in the video cells,
and a register enables or disables the use of the Z mode. Two bits are
sufficient for Z field, being 4 the number of playfields.

The Alpha fields can be used for cross-transparency effects among the
playfields; if 4 circuits can't be implemented (due to technology problems)
at least 2 circuits will be used: allowing the first and second line-buffer
with this feature. This will allow realtime MultiMedia/Video effects that
no SVGA chip can handle in realtime, like cross fading (a picture fades
into another, with cross transparency), with no CPU usage (all made by
the AgaEXTENDER, using chipram). No SVGA chip can allow such Video
performances, and the CPU->chipram bandwidth problem here and in many other
MultiMedia applications is not present.

The 11bit palette field can be used partially or totally as one channel.
Loading an immediate value in the RGB field in the same cell, example an
RGB value of 50%,50%,50% (for fog effects) or 0%,0%,0% (for "darkness" effect)
will allow CPU-free shading, with no wasted bandwidth. The RGB and palette
outputs will be mixed together (this is selectable using Mode registers) of
an amount proportional to the Alpha channel. Thus, in this case, if we fill
the line-buffers with 1 word/pixel (8bit for palette and 8bit for Alpha,
or 2 byteplanes if necessary), we'll feature shading effects for free.
All the heavy computation is made by the AgaEXTENDER, speeding up the gfx
routines of the CPU, and beating the PC hardware in TextureMapping+Shading.

Before each palette table, there's a mask+or register to select a palette
bank to allow fast color switching in OS's AgaEXTENDED draggable screens,
when palettes with less than 11bits are used.

The "scaling" and programmable resolution are performed simply using a fixed
point register, that will be added every 28/56Mhz clock cycle to the pointer.
Thus the AgaEXTENDER has always a fixed resolution (SHRES) where the pixels
are subdivided to N (not integer, fixed point) parts to provide any apparent
resolution.
Ofcourse, using negative step values, we'll have an horizontally flipped
playfield (all with no CPU usage nor bandwidth waste).
This will allow any resolution, always with complete bandwidth usage, and
advanced effects for MultiMedia and games, including realtime CPU-free Zoom,
parallax effects for free, X Flip for free (Y Flip and 180degrees rotations
handling the Y Flip with AGA pointers and modulo) and many other effects
limited only by the imagination of coders. It will also allow programmable
resolution for WB screens with the same resolution scalable to overscan or not.
Imagine (in Scala Multimedia) zoom, CrossFading (transparency among images),
fast animation decompression, shading effects (on a pixel base, not only the
whole screen), and many other special effects: all without the need to use a
powerful CPU and all implemented by the AgaEXTENDER, thus practically removing
the problem of CPU->chipram bandwidth, because the few data that the CPU must
copy in chipram (animations and images) are compressed, then cleverly
decompressed internally by the AgaEXTENDER. Once more a time, this shows the
superiority of clever and versatile (wise, foresee) custom solutions rather
than standard sad solutions such as the ones used by PC clones.

NOTE: The antialiasing (linear interpolation) part must be discussed with an
engineer, to adapt it to the technology being used. Anyway, to clear doubts
about its complexity, I'll show an example;
To double the apparent resolution, it's sufficient a 3 states machine:
1) Output Pixel A (the one previously fetched, and stored in a register)
2) Output Pixel A+B and shift one position to the right (features antialiasing)
3) Output Pixel B (currently fetched) and swap registers
IMPORTANT: The antialiasing is performed in this stage, not in the fetch one.
NOTE: the "I" 3bits field allows for each pixel compression of upto 128 times,
giving antialiasing (linear interpolation) quantity selection for each pixel.

The 2nd STAGE is the one that will be semplified depending on the limits of
technology; it basically "elaborates" and mixes the data fetched and sorted
in the first stage.
There're too many ideas to illustrate all them here: some will be implementable
and some others will not (fully or partially) depending on the technology
being used. Here I would like to have comments from a VLSI/ULSI engineer,
to discuss about limits of the technology that could be used, and drive my
hardware/software imagination to get the best from the technology being used.
Whatever technology is used, AGA performances can be improved a lot.

###############################################################################

3RD STAGE: Output    (Watch diagrams-picture STAGE3_Output.IFF)

NOTE: the genlock audio bit enables/disables a circuit to pass through the
      AgaEXTENDER to disable the device.

Video part:

The Vertical Synch from Lisa is unchanged.
The Horizontal Synch is independent, to allow scan-doubling for free (AGA
spends 2 times more bandwidth in this case). NOTE: this independence is meant
only as single or double speed, to double horizontal frequency of scan modes
such as PAL/NTSC, with no need to use doubled bandwidth. In this case, the
line buffer would load a 15Khz line, and output it two times (30Khz each).
NOTE: the horizontal centering is performed selecting the blank time and
starting position of pixels from the OUTPUT buffer.

NOTE about 2nd STAGE, 3rd STAGE and related graphic diagrams:
The "OUTPUT BUFFER" is an unnecessary part that has been included only to
semplify the explanation, but it can be removed to semplify the final hardware.

The device should provide digital output rather than analog for 2 reasons:
1) Using an external DAC (such as the VideoHybrid) will semplify the IC.
2) Providing RGB and Alpha digital outputs will be useful in SetTopBoxes, or
   in MultiMedia multi-gfxchip applications. This way the AGA+AgaEXTENDER
   technology could be sold to provide high performances parallel solutions,
   using a digital mixer, with an extremely versatile programming capability,
   and low costs (and experienced programmers/developing tools).

#######################################

Audio part:


For each of the 4 DAC's, there will be a 20 (or more) bits register, with a
reset input and a data input (automatically added to previous content, using
integer signed arithmetic). This audio machine will have a period of 2^N
cycles, beginning with a reset and continuing storing (adding) new sample data
coming from the audio buffers, and at the end shifted to the right of N bits,
to get all the virtual audio channels digitally mixed into one, ready for a
single DAC conversion.
NOTE: Each pointer will have a fixed point part, to allow any apparent sampling
frequency. The pointer will point to an address in one audio buffer, and every
clock cycle of the audio subsystem, a fixed point delta (DDA) will be added to
the pointer.
NOTE: You can select which of the 16bit part to use, thus allowing mono modes
(useful for ambient sounds, where only programming the 4 independent volumes
is necessary, thus saving much ram).
NOTE: Of course, every "reset" of the state machine means also the begin of a
new machine's cycle, and to output to the DAC the previously elaborated 16bits.

---

Four 16bit DAC's. (Or use multiplexed output to use four external 16bit DAC's).

---

We have 2 input connectors from the Amiga audio, that will be mixed with the 2
front audio output of the AgaEXTENDER. Other 2 connectors will be used for the
2 rear speakers, or mixed with the front ones using an external switch if four
speakers system is not available (absoluly recommended). 8)

###############################################################################

NOTE: since the AgaEXTENDER registers can be written only by the HiRes-copper,
every VerticalBlank if the AgaEXTENDER is enabled (through audio genlock bit)
the HiRes-copper is automatically started with a standard configuration.

This is an example of how the 256 internal registers could be mapped:


NOTE: EVERY REGISTER IS BYTE SIZE.

N.  Name         Function

$00=StartPtA_D0  ;15..8 addr. 64bit cell in the line buff to store data
$01=StartPtB_D0  ;7..0 addr. 64bit cell in the line buff to store data
$02=StartRstA_D0 ;15..8 addr. "" cell in the line buff to restart storing data
$03=StartRstB_D0 ;7..0 addr. "" cell in the line buff to restart storing data
$04=FilterS_D0   ;select the byte to start storing in the bitfilter reg. 0..7
$05=FilterD_D0   ;store a byte (with postincrement) in the 64bits bitfilter
$06=SortS_D0     ;select the byte to start storing in the distributor reg. 0..7
$07=SortD_D0     ;store a byte (with postincrement) in the 64bits distributor
$08=Delay_D0     ;bitdelayer: value is from 0 to 63 (multiples of 64 thru AGA)
$09=BitLenght_D0 ;from 1 (planar) to 64 (full RDziPaRGB)
$0A=Total_D0     ;number of cells to process before halt (till next HorizSynch)
$0B=Write_D0     ;number of consecutive cells to write into the buffer
$0C=Skip_D0      ;number of consecutive cells to skip of the buffer
$0D=Dest_D0      ;bitflags to select to write or not in each line/audio buffer
$0E=Mode0_D0     ;bitmapped register
$0F=Mode1_D0     ;bitmapped register

[...]

$70=StartPtA_D7  ;15..8 addr. 64bit cell in the line buff to store data
$71=StartPtB_D7  ;7..0 addr. 64bit cell in the line buff to store data
$72=StartRstA_D7 ;15..8 addr. "" cell in the line buff to restart storing data
$73=StartRstB_D7 ;7..0 addr. "" cell in the line buff to restart storing data
$74=FilterS_D7   ;select the byte to start storing in the bitfilter reg. 0..7
$75=FilterD_D7   ;store a byte (with postincrement) in the 64bits bitfilter
$76=SortS_D7     ;select the byte to start storing in the distributor reg. 0..7
$77=SortD_D7     ;store a byte (with postincrement) in the 64bits distributor
$78=Delay_D7     ;bitdelayer: value is from 0 to 63 (multiples of 64 thru AGA)
$79=BitLenght_D7 ;from 1 (planar) to 64 (full RDziPaRGB)
$7A=Total_D7     ;number of cells to process before halt (till next HorizSynch)
$7B=Write_D7     ;number of consecutive cells to write into the buffer
$7C=Skip_D7      ;number of consecutive cells to skip of the buffer
$7D=Dest_D7      ;bitflags to select to write or not in each line/audio buffer
$7E=Mode0_D7     ;bitmapped register
$7F=Mode1_D7     ;bitmapped register

$80=AudPerA_A0   ;Frequency, high byte (it's only the fixedpoint delta to add)
$81=AudPerB_A0   ;Frequency, low byte (it's only the fixedpoint delta to add)
$82=AudVolFL_A0  ;0..255, Volume of front left channel
$83=AudVolFR_A0  ;0..255, Volume of front right channel
$84=AudVolRL_A0  ;0..255, Volume of rear left channel
$85=AudVolRR_A0  ;0..255, Volume of rear right channel
$86=AudMode_A0   ;bitflags
$87=AudNLen_A0   ;audio machine: N lenght (to allow mixing of N virtual voices)
$88=AudMixL_A0   ;mixer/smoother: 0=audio_off 1=normal 2=TwoCells 3=Three...
$89=AudMixR_A0   ;mixer/smoother: 0=audio_off 1=normal 2=TwoCells 3=Three...

[...]

$B0=AudPerA_A3   ;Frequency, high byte (it's only the fixedpoint delta to add)
$B1=AudPerB_A3   ;Frequency, low byte (it's only the fixedpoint delta to add)
$B2=AudVolFL_A3  ;0..255, Volume of front left channel
$B3=AudVolFR_A3  ;0..255, Volume of front right channel
$B4=AudVolRL_A3  ;0..255, Volume of rear left channel
$B5=AudVolRR_A3  ;0..255, Volume of rear right channel
$B6=AudMode_A3   ;bitflags
$B7=AudNLen_A3   ;audio machine: N lenght (to allow mixing of N virtual voices)
$B8=AudMixL_A3   ;mixer/smoother: 0=audio_off 1=normal 2=TwoCells 3=Three...
$B9=AudMixR_A3   ;mixer/smoother: 0=audio_off 1=normal 2=TwoCells 3=Three...

$C0=PalWriteA    ;store alpha into current palette register
$C1=PalWriteR    ;store red into current palette register
$C2=PalWriteG    ;store green into current palette register
$C3=PalWriteB    ;store blue into current palette register
$C4=PalStartH    ;write here the starting entry, high byte of address
$C5=PalStartL    ;write here the starting entry, low byte of address
$C6=PalAInc      ;increment current palette register by N value

[...]

control registers

[...]


$FF=No-Op        ;does nothing (still requires a unused data operand).

NOTE: If the device is too complex, another solutions is:
      Every Horiz.Synch the data coming from BITPLANE0 is assumed to contain
      commands. This way it can change registers, or reconvert with a register
      write the BITPLANE0 input to Video or Audio, untill the next HorizSynch.

###############################################################################

Q: But, aren't SVGA chips still better than AGA+AgaEXTENDER?
A: But what is the best solution to this problem? Not improving nothing at all?
   Buying a SVGA card and putting it into a 40Mhz 68030 Amiga wouldn't improve
   the situation unless we have a 200Mhz 68060 or a fast PPC604e, and having
   such fast CPU's wouldn't make the Amiga much better than PC: and why users/
   developers should risk on it while the PC market is so sure, just to have
   only a slightly better OS? Windows will improve too, and after the
   hardware, also the software of PC will make us nearly forget the advantages
   of AmigaOS. The AgaEXTENDER is *not* only a device to allow 24bit images,
   it's much more.
   The AgaEXTENDER can allow any planar+chunky video mode, that will speed up
   everything and allow realtime effects such as transparency, free rotation in
   the pitch axis (3D), zoom/flip in 2D screens, 3D effects (parallax and
   sprites), accelerated MPEG decoding, multi playfields, 16bit versatile audio
   (any number of virtual voices that the 4Mb/sec (or more if we want) free
   bandwidth allows), and much more, being completely programmable. I dont
   believe that in the 1997 everyone will throw out their A1200/Walker and
   spend a lot of money for PowerAmiga immediately. The AgaEXTENDER is the best
   and only realistic solution to give new life to old A1200's and Walker, to
   put it on the competition with Pentiums, and to not betray the old users,
   being a clever add on to the existing AGA architecture. I would also add
   that the PowerAmiga would be much more powerful and versatile if, instead
   of buying SVGA chips, would allow something with the versatility of
   AgaEXTENDER and the same old AGA, but with x2/x4 clock and the possibility
   to use VideoRam / EDO Ram and/or 64bit bus. Or the PowerAmiga will be a ugly
   copy of PC hardware, with only a different CPU (PowerPC is one of the worst
   RISC's around, and the multiprocessing capabilities of 80686 don't seem to
   show that the MsDos/Windows PC are destinated to die indeed) and a good OS
   with a very huge and painful porting to make and an uncertain future.
   AGA already exists, it's cheap, improving its bandwidth but keeping it as it
   is wouldn't cost much indeed, and adding the versatility of a device like
   AgaEXTENDER would let it outperform any standard chip that could be bought
   from an external manufacter. Amiga Technologies could become a seller of
   MultiMedia chipsets themself, selling AGA (with or without improved memory
   access from CPU, and improved bandwidth from chipmem to screen) and the
   versatility of the AgaEXTENDER: who would buy instead a SVGA chip that is
   capable only to offer a standard 8bit chunky screen, or little more, and
   needs a 200Mhz CPU just to fade part of a screen or to move a sprite?
   Every A1200 out there cannot be upgraded, unless we use a device like this.

Q: Perhaps the *best* solution is to give to all Amigas a SVGA chipset?
A: Please understand that who programs both realtime 2D/3D games (exploiting
   every possibility from the Amiga hardware) but also OS applications, can be
   useful to suggest what both the more original coders and OS programmers can
   need to make the "software miracles" that the Amiga needs today and in the
   future. There's the need to have a *global* vision, including all the needs
   that aren't only of today games ( = Doom) but also the ones that will be
   needed in the near future. Thinking about best WorkBench performances,
   thinking to give the unimaginated best possibilities to software as Scala.
   Versatility, allowing coders to invent tomorrow what the hardware can't
   directly offer or *foresee* today, but using the same fully programmable
   hardware.
   It's not like with the Pentium that can exploit its MIPS to hide the SVGA
   simple chunky structure problems: the standard Amiga has only a 40-50Mhz
   68030, that can't do all by itself if it wanna win the competition vs
   Pentiums. Moreover, PC's have 200Mhz Pentium CPU's to hide the huge
   limitations of the standard SVGA gfx chips (only the simplest chunky mode),
   while the Walker will have a 40Mhz 68030 and there will never be a low-end
   Amiga with the top of the range CPU: if you put a SVGA chipset into the
   Walker, you only get a 50Mhz 80386. Did the A500 use CGA screens? To the
   coders that know both VGA and AGA+AgaEXTENDER chips: tell me what is better.

###############################################################################

That's it, if an engineer examine the device and let me know if some
parts can't be implemented and describe the reasons, I can adapt my ideas to
the disponibility of the technology being used, and add new specific ideas.
But all with the imagination of a coder (both of games and OS applications);
with the AgaEXTENDER I am sure I've shown that with a simple and generic
hardware structure the skilled coders can make software miracles.

Amiga Technologies have the power to keep low the costs, both for new Amigas
(with the AgaEXTENDER built in) and older Amigas with a AgaEXTENDER to be
plugged in the RGB port, making a fair price (the Graffiti is ridicolously
overpriced, and is practically useless).

About the problem of ChipMem bandwidth, this can't obviously solved by the
AgaEXTENDER, but since a chip write is parallelized with the execution of
some CPU cached instructions and the AgaEXTENDER allows semplifications
for the programmer's routines like no other gfx board does, I believe at
the end that the ChipMem bandwidth will not be a big problem anymore.
I recall that nowadays also PC software renders graphic into main ram and
then copy them to video memory. But the AgaEXTENDER will be able to use
complex screen modes immediately after the fastest possible CPU longword
copy from fastram. The AgaEXTENDER can get highly compressed video with a
fast copy CPU->chipram, and then decompress it internally.
A dual-port ram for the new AGA Amigas would eliminate every CPU->chipram
limitation where present, making the AgaEXTENDER even more useful, but would
still keep compatibility with older AGA Amigas.

The audio part is simply stunning, besides the lack of a DSP to keep costs low.
With simple programming tricks it is possible to get more audio voices, as
well as realtime riverber/echo effects that would extend the space sensation
given by the 3D stereo sound, all saving as much ram for data as possible,
and with minumum CPU usage, and with no Video bandwith waste.

I wrote this doc in some days, so if it can become reality, I'll work on it
and improve it as much as the technology that will be used will allow, anyway
I think that the actual technology allows all the features of this project.
I can write a complete emulator of the AgaEXTENDER (not realtime of course)
in 680x0 optimized code, giving in output a complete timing diagram both for
video and audio signals, to study the device.
If the AgaEXTENDER is too complex, I can semplify it as much as needed to
fit into the technology being used: from the 3+3+2 bit sampler to the
line-buffer, it could be nearly done with TTL circuits, so I believe that
also if a semplification could be needed, it will not need to be big.
I've a lot of ideas to make an agaextender.library for complete OS legal use
of the device, still allowing all the hidden potentials of the device, and
keeping full compatibility with any future hardware (the library would accept
a description in a "descriptor language" and program the AgaEXTENDER, the
AGA's copper and BitPlanes, every low-level resource in the best way. When
a new hardware will be available, a new library will do the same for it).

I *HOPE* Amiga Technologies will consider this project with extreme attention,
the old designers got too many good ideas refused for no good reasons, and the
result is that the Amiga has lost many precious occasions to become great.

      *** We don't need expensive hardware, we need clever solutions ***

If the device is too complex (all depends on how much static ram the actual
technology can allow to contain in a chip), there's a new version designed
to remove 95% of audio buffers, and/or keeping only one (but improved) video
buffer, keeping most of AgaEXTENDER features. The choice is between much ram
single access, or some ram with multiple access. All depends on the technology
being used; I can adapt the AgaEXTENDER project after being aware of the limits
and qualities of the available technology.

I really believe in this project, it has no comprimizes, only advantages
both technical and commercial. I also believe that this device would attract
thousands enthusiast developers to the Amiga again, as in 1985-1987.
What would offer a PowerAmiga with standard PC chips to attract developers?
What new features? What could make born new "enthusiam"? In a era where the
PC has only the future more healthy than the present, such a PowerAmiga could
only keep some applications' users and developers of the old Amigas. And if
someone believes in a new standard system, then the BeBox will attract his
attention more than every other system available.
The more a solution is standard, the more it's less powerful.
The PowerAmiga OS will allow the use of a wide range of standard gfx boards,
that have in common only the simplest of chunky modes. It's like if in a
society the govern decides that since some people are illiterate, every
public information must be given by voice, not written. This way the
"compatibility" will be guaranteed, but these societyes will be of the
4th world. The lack of updates in Amiga's custom chips has provocated the
need to buy Gfx Boards, but this last has deformated the sense of versatility
of some users. Now, being the standard chips surely able to handle in a good
way static images, WorkBench screens and (some of them) having a blitter to
move windows, makes the WB much nicer but kills completely any Video/MultiMedia
and Games application. If a generic user wants only a WB, he can buy a PowerMac
or a BeBox, or a Unix PC. The Amiga is aimed to a market with various needs,
from games, to WB, till advanced Video applications. The way PowerAmiga seems
have taken will allow a nice WB, but no games or decent Video applications.
Furthermore, only high-end PowerAmigas will run gfx based software at a decent
speed; if to get 1/10 of what a 3DO-M2 can do, SVGA chips need a 200Mhz Pentium
(that is NOT a slow CPU) using hardware banging, what can do the other CPU's
and OS?

If the next Amiga generation will use SVGA GfxChips and PowerPC CPU, then I
believe that professional game developers will find much more convenient to
learn to program the same GfxChip (SVGA) and a new CPU (80686 instead of
PowerPC) because 80686 will have a solid market, while PowerAmiga will not
have it, still being both simple PC clones.
Applications such as Scala will be sad on a PowerAmiga with standard chipset
and all done through the OS: how much more competitive than PC's will this be?

The Amiga is originality, the Amiga is creativity, the Amiga is clever
solutions, the Amiga can't be just another PC clone with a OS that cannot
compete with the one of the BeBox, nor with all the others stable OS's around.

The only wise Amiga future is a powerful and innovative custom chipset, and the
AgaEXTENDER design could be the bridge between these two hardware generations.

If the Amiga hardware will be a PC clone, then investing money and development
into a standard 80x86 PC will be a much more wise choice. If the Amiga will be
maybe less powerful than the most modern PC, but will have the clever solutions
that can allow its enthusiast developers to overwork its custom hardware, then
the Amiga will reborn and will have a valid commercial justification of life,
both for hardware and software.

I and many other developers renounced to big earnings from the PC market,
accepting the extremely difficult economic condition of the Amiga market, to
support the re-birth of the Amiga, due to the enthusiams we had for the Amiga.
But if AT makes its hardware a PC clone, then we've no reason to support a
PC clone with no market rather than a 80x86 PC with huge and trustable market.

I hope that nobody will forget what the Amiga really was, and in which way
it should evolve.

Best Regards,


------------------------------------------------------------------------------
|                                                                            |
|    Fabio "Maverick" Bizzetti - bizzetti@mbox.vol.it - Maverick* at IRC     |
|              The maker of "CyberMan" and "Virtual Karting"                 |
|   working on "Virtual Rally" and "StarFighter", the 3D game that will      |
|                        bring the Amiga to the top                          |
|                                                                            |
------------------------------------------------------------------------------