          Motorola DSP Benchmark Descriptions 
 
This set of programs was chosen for a combination of low memory 
usage and high speed.  It is possible to improve on speed at the 
cost of higher memory usage or reduce memory usage at the cost of 
lower speed algorithms in almost all cases.  Additional programs 
and information are available on the Motorola Dr. BuB electronic 
bulletin board (212A - (512)440-3771 or V.22 - (512)440-3772 
Format: 7 data bits, even parity, 1 stop bit).  The voice 
telephone number for Motorola DSP Applications is (512)440-2030. 
 
The programs have been named according to their order in the 
following list (which is numbered in hexadecimal on the disk to 
make the programs sort correctly).  Additionally, the programs
are separated by the DSP chip family they were written for.  For
example, program 7-56.asm is the seventh program listed (Dot
Product) and is written for the DSP56000/1 chip.  The programs
for the DSP56000/1 have all been assembled and executed.  The
programs for the DSP96001/2 have not been executed because
silicon is not available as of this writing.  Six additional 
programs have been provided for the DSP56000/1 and DSP96001/2.
They are port to memory FFTs that use a complex input with the 
imaginary input set to zero.  These programs are named D2-56.ASM, 
E2-56.ASM, F2-56.ASM, D2-96, E2-96, and F2-96.    
 
The following is a numbered list of the benchmark programs 
written for the Motorola DSP56000/1 and DSP96001/2 chips.  
 
1.  20 Tap FIR Filter - The first three benchmarks are Finite
Impulse Response (FIR) filter routines that take data from an
external parallel port, filter the sample and then send the
result to a different external parallel port.  The external data 
converters are assumed to be 16-bits and are able to convert 
faster than the DSP can complete the filter such that the next 
sample is available without waiting.  The filter characteristics

are not important for this benchmark.  
 
2.  64 Tap FIR Filter - same as #1 except with 64 taps. 
 
3.  67 Tap FIR Filter - same as #1 except with 67 taps.  
  
4.  8 Pole Cascaded Four Coefficient Canonic Biquad IIR Filter (4X) 
- This benchmark is an Infinite Impulse Response (IIR) filter 
routine that takes data from a memory mapped external parallel port, 
filters the sample and then sends the result to a different 
memory mapped external parallel port.  Three filter types are 
done, and all are 8-pole filters made by cascading 2-pole 
sections.  In the first (benchmark #4), the filter is made by
cascading canonic four coefficient biquad sections.  The second 
filter (benchmark #5) is made by cascading five coefficient canonic 
biquad sections.  The third filter (benchmark #6) is made by 
cascading transpose five coefficient biquad sections. 
In each case, the data enters the DSP via a memory mapped 
external parallel port and leaves via a different memory mapped 
external parallel port.  The external data converters are assumed
to be 16-bit and convert data faster than the DSP such that the 
next sample is available without waiting.  The filter 
characteristics are not important for this benchmark.  
  
5.  8 Pole Cascaded Five Coefficient Canonic Biquad IIR Filter (5X) 
- please read the text with #4.  
  
6.  8 Pole Cascaded Five Coefficient Transpose Biquad IIR Filter - 
please read the text with #4 above.  
  
7.  Dot Product - This benchmark performs a dot product of two 
vectors and gives a scaler result.  Each vector is represented by
two points in 2-D (x,y).  The original vectors are already in 
memory and the result goes to memory.  The vectors can be 
destroyed during the calculation which reduces the memory 
required.       
    
8.  Matrix Multiply (2x2 times 2x2) - This benchmark multiplies 
one matrix by another.  Both matrices are already in memory and 
the result goes to memory.  The original matrices can be 
destroyed during the multiplication.  The routine multiplies a 
2x2 matrix by a 2x2 matrix.  
  
9.  Matrix Multiplication (3x3 times 3x1) - This benchmark 
multiplies a matrix by a vector.  Both the matrix and the vector 
are already in memory and the result goes to memory.  The original 
matrices can be destroyed during the multiplication.  The routine 
multiplies a 3x3 matrix by a 3x1 vector.  
  
10. M-to-M FFT (64 Point) - This benchmark performs a complex, 
radix-2 FFT with complex data that is already available in 
memory.  The FFT's results are a set of complex values whose 
addresses are normally ordered (i.e. bit reversal is included in 
the benchmark).  Three sets of data are processed, a 64- (benchmark 
#10), a 256- (benchmark #11), and a 1024-point set (benchmark #12). 
Note that the assembler reports a warning during assembly of 
benchmarks A-56.ASM, B-56.ASM, and C-56.ASM.  This is because the 
sinusoid peaks generated by sincos.asm equal plus and minus one 
which causes an overflow.  Saturation limiting  forces the result 
to the closest fractional number preventing a significant error 
and allowing the warning to be ignored. 

11. M-to-M FFT (256 Point) - please read the text with #10 above. 
  
12. M-to-M FFT (1024 Point) - please read the text with #10 
above.   
  
13. P-to-M FFT (64 Point) - This benchmark (and benchmarks #14 
and #15) perform a complex, radix-2 FFT with real data acquired 
from a memory mapped external parallel port.  The results of the 
FFT are a set of real values in normal order (i.e. bit reversal 
is included in the benchmark).  Three sets of data will be 
processed, a 64- (benchmark #13), 256- (benchmark #14), and a
1024-point set (benchmark #15).   
 
A real data stream enters the system via an external parallel 
port.  The external A/D converter is 16-bits and converts data 
faster than needed by the DSP such that the DSP does not need to
wait for data.   
  
14. P-to-M FFT (256 Point) - please read the text with #13 above. 
  
15. P-to-M FFT (1024 Point) - please read the text with #13 
above.   

Note: Several of the FFT benchmark programs use "include" directives 
      in assembly code to include some standard macros in the benchmarks 
      during assembly.  These macros are provided as individual files 
      on this disk.  They are:

      sinegen.asm -  generates "points" samples of a sinewave of arbitrary
                     amplitude, frequency and phase.  It uses the macro call 
                     "sinegen points,amplitud,freq,phase" where
                     points  -  number of points (1-65536)
                     amplitud-  maximum amplitude of sinewave (0.0-1.0)
                     freq    -  frequency w.r.t. sampling frequency (0.0-1.0)
                     phase   -  starting phase in radians (0.0-6.28).
             

      sincos.asm  -  generates sine and cosine coefficient lookup tables 
                     for Decimation in Time FFT twiddle factors.  It uses 
                     the macro call "sincos  points,coef" where 
                     points - number of points (2 - 32768, power of 2) and 
                     coef   - base address of sine/cosine table
                              negative cosine value in X memory
                              negative sine value in Y memory.

      wbh4m.asm   -  generates a Blackman-Harris 4 Term Minimum Sidelobe 
                     Window.  It uses the macro call "wbh4m   points" where
                     points - number of points (1 - 65536).

      magsqr.asm  -  generates the magnitude squared of a complex number.
                     The macro call is magsqr and the pass parameters are:  
                     x1 = real input,
                     x0 = imaginary input, and
                     a = real*real + imag*imag   (unrounded, double 
                     precision output).
     
      sqrt3.asm   -  is a full 23 bit precision square root routine using
                     a successive approximation technique.  The macro call 
                     is "sqrt3" and the pass parameters are:
                     y  = double precision (48 bit) positive input number and
                     b  = 24 bit output root.
