Motorola DSP Benchmark Descriptions This set of programs was chosen for a combination of low memory usage and high speed. It is possible to improve on speed at the cost of higher memory usage or reduce memory usage at the cost of lower speed algorithms in almost all cases. The programs have been named according to their order in the following list (which is numbered in hexadecimal on the disk to make the programs sort correctly). For example, program 7-56.asm is the seventh program listed (Dot Product) and is written for the DSP56000/1 chip. The programs for the DSP56000/1 have all been assembled and executed. Additional programs have been provided for the DSP56000/1. They are port to memory FFTs that use a complex input with the imaginary input set to zero. These programs are named D2-56.ASM, E2-56.ASM, and F2-56.ASM. The following is a numbered list of the benchmark programs written for the Motorola DSP56000/1 and DSP96001/2 chips. 1. 20 Tap FIR Filter - The first three benchmarks are Finite Impulse Response (FIR) filter routines that take data from an external parallel port, filter the sample and then send the result to a different external parallel port. The external data converters are assumed to be 16-bits and are able to convert faster than the DSP can complete the filter such that the next sample is available without waiting. The filter characteristics are not important for this benchmark. 2. 64 Tap FIR Filter - same as #1 except with 64 taps. 3. 67 Tap FIR Filter - same as #1 except with 67 taps. 4. 8 Pole Cascaded Four Coefficient Canonic Biquad IIR Filter (4X) - This benchmark is an Infinite Impulse Response (IIR) filter routine that takes data from a memory mapped external parallel port, filters the sample and then sends the result to a different memory mapped external parallel port. Three filter types are done, and all are 8-pole filters made by cascading 2-pole sections. In the first (benchmark #4), the filter is made by cascading canonic four coefficient biquad sections. The second filter (benchmark #5) is made by cascading five coefficient canonic biquad sections. The third filter (benchmark #6) is made by cascading transpose five coefficient biquad sections. In each case, the data enters the DSP via a memory mapped external parallel port and leaves via a different memory mapped external parallel port. The external data converters are assumed to be 16-bit and convert data faster than the DSP such that the next sample is available without waiting. The filter characteristics are not important for this benchmark. 5. 8 Pole Cascaded Five Coefficient Canonic Biquad IIR Filter (5X) - please read the text with #4. 6. 8 Pole Cascaded Five Coefficient Transpose Biquad IIR Filter - please read the text with #4 above. 7. Dot Product - This benchmark performs a dot product of two vectors and gives a scaler result. Each vector is represented by two points in 2-D (x,y). The original vectors are already in memory and the result goes to memory. The vectors can be destroyed during the calculation which reduces the memory required. 8. Matrix Multiply (2x2 times 2x2) - This benchmark multiplies one matrix by another. Both matrices are already in memory and the result goes to memory. The original matrices can be destroyed during the multiplication. The routine multiplies a 2x2 matrix by a 2x2 matrix. 9. Matrix Multiplication (3x3 times 3x1) - This benchmark multiplies a matrix by a vector. Both the matrix and the vector are already in memory and the result goes to memory. The original matrices can be destroyed during the multiplication. The routine multiplies a 3x3 matrix by a 3x1 vector. 10. M-to-M FFT (64 Point) - This benchmark performs a complex, radix-2 FFT with complex data that is already available in memory. The FFT's results are a set of complex values whose addresses are normally ordered (i.e. bit reversal is included in the benchmark). Three sets of data are processed, a 64- (benchmark #10), a 256- (benchmark #11), and a 1024-point set (benchmark #12). Note that the assembler reports a warning during assembly of benchmarks A-56.ASM, B-56.ASM, and C-56.ASM. This is because the sinusoid peaks generated by sincos.asm equal plus and minus one which causes an overflow. Saturation limiting forces the result to the closest fractional number preventing a significant error and allowing the warning to be ignored. 11. M-to-M FFT (256 Point) - please read the text with #10 above. 12. M-to-M FFT (1024 Point) - please read the text with #10 above. 13. P-to-M FFT (64 Point) - This benchmark (and benchmarks #14 and #15) perform a complex, radix-2 FFT with real data acquired from a memory mapped external parallel port. The results of the FFT are a set of real values in normal order (i.e. bit reversal is included in the benchmark). Three sets of data will be processed, a 64- (benchmark #13), 256- (benchmark #14), and a 1024-point set (benchmark #15). A real data stream enters the system via an external parallel port. The external A/D converter is 16-bits and converts data faster than needed by the DSP such that the DSP does not need to wait for data. 14. P-to-M FFT (256 Point) - please read the text with #13 above. 15. P-to-M FFT (1024 Point) - please read the text with #13 above. Note: Several of the FFT benchmark programs use "include" directives in assembly code to include some standard macros in the benchmarks during assembly. These macros are provided as individual files on this disk. They are: sinegen.asm - generates "points" samples of a sinewave of arbitrary amplitude, frequency and phase. It uses the macro call "sinegen points,amplitud,freq,phase" where points - number of points (1-65536) amplitud- maximum amplitude of sinewave (0.0-1.0) freq - frequency w.r.t. sampling frequency (0.0-1.0) phase - starting phase in radians (0.0-6.28). sincos.asm - generates sine and cosine coefficient lookup tables for Decimation in Time FFT twiddle factors. It uses the macro call "sincos points,coef" where points - number of points (2 - 32768, power of 2) and coef - base address of sine/cosine table negative cosine value in X memory negative sine value in Y memory. wbh4m.asm - generates a Blackman-Harris 4 Term Minimum Sidelobe Window. It uses the macro call "wbh4m points" where points - number of points (1 - 65536). magsqr.asm - generates the magnitude squared of a complex number. The macro call is magsqr and the pass parameters are: x1 = real input, x0 = imaginary input, and a = real*real + imag*imag (unrounded, double precision output). sqrt3.asm - is a full 23 bit precision square root routine using a successive approximation technique. The macro call is "sqrt3" and the pass parameters are: y = double precision (48 bit) positive input number and b = 24 bit output root.