SPECTROGRAM (2.2) 1. PRINCIPLES OF OPERATION Most ordinary sounds are complex combinations of individual frequency components or harmonics which cover a wide frequency range and vary in intensity over time. A spectrogram is simply a plot of the frequency content of such an audio signal as a function of time. In this program, digital audio recordings (PCM format) are analyzed to produce a plot of frequency versus time, with harmonic intensity represented by a variable color scale. These spectrograms reveal the fascinating hidden frequency structure of audio signals and can be used for identifying or classifying particular sounds. When used to analyze recorded voice, spectrograms have also been know as 'voice prints'. Spectrogram uses a mathematical Fast Fourier Transform in performing the frequency analysis. FFTs are usually specified by the number of input data points used in each calculation. For a sampling rate of F (Hz), an N input point FFT will produce a frequency analysis over a frequency range of F/2. Signal amplitude will be calculated at N/2 frequency increments in this range. All this means is that for a digital signal sampled at 10000 Hz, a 512 point FFT will calculate signal amplitude to be found at 256 frequency increments from 0 Hz to 5000 Hz. This will become clear as you calculate and observe different spectrograms. Contrary to popular opinion, higher sampling rates are not always necessary for high fidelity recording. The choice of sampling rate depends entirely on the highest frequencies in the audio signal. The rule of thumb is to use a sampling rate that is twice the highest frequency in the audio signal. That is, if you expect to have no frequency components above 11KHz, then a sampling rate of 22KHz is adequate. If you examine a spectrogram and see that all of the signal is concentrated in lower frequency components at the bottom of the display, then it is a good bet that the recording was sampled at too high a rate, wasting a significant amount of memory. This program produces the highest quality spectrograms of digital recordings which have been sampled at the appropriate rate. 2. SYSTEM REQUIREMENTS Spectrogram will run on any Windows 3.1 equipped machine. However, the intensive calculations required to develop the frequency spectrum demand the fastest processor available. In addition, large sound files will require much memory for analysis and display, so the more memory the better. Spectrogram will process any 8 or 16 bit audio data in PCM format including ".wav" files or raw data files. Spectrogram cannot process compressed audio data found occasionally in large .wav files. 3. COMPUTING AND DISPLAYING A SPECTROGRAM Choose "Open" from the "File" menu to load a digital sound sample file. Once a file has been selected, Spectrogram will present the "Analysis Options" dialog box where you will specify the parameters of the frequency analysis. To select the default values, just press the space bar. To tailor the calculations to your own preferences, see below. a. SAMPLE CHARACTERISTICS You may enter any value of sample rate from 8000 Hz to 44100 Hz. If you have selected a .wav file, the sample rate displayed will be the rate used in the original recording. If you have selected a raw data file, a sample rate of 11025 Hz will be initially assumed, and you should enter the correct value if necessary. You may also select the beginning and ending location in the selected file (in bytes) to be analyzed. Initially, the starting and ending location of the entire file will be displayed. If you make no change here, the entire file will be analyzed. You also have a choice of 8 bit or 16 bit data resolution. Pick the value which you know corresponds to the data file you are analyzing. If you are loading a .wav file, the correct value will already be shown. If this is a raw data file, 16 bit data will be assumed, but it is up to you to specify the correct value. b. FFT Selection You have a choice of 512, 1024, or 2048 point FFTs for the frequency analysis. Use 512 points routinely. Use 1024 and 2048 point FFTs only for high resolution analysis. The higher resolution FFTs require more time to compute the spectrogram. For this reason, it is sometimes preferable to decrease sampling rate, if increased frequency resolution is needed, rather than to use a higher resolution FFT. c. Horizontal Scale Selection You may select a horizontal scale of 2, 4, 8, or 16ms per line. Each vertical line in the spectrogram display represents the output of one FFT calculation. The FFT data input window is stepped sequentially through the data, performing an FFT calculation at each step. The horizontal scale selected determines the length of the step between each FFT and thus the total number of FFTs required. Experiment with these values to pick the horizontal scale you prefer. d. Display Threshold Selection You are also given a choice of display threshold in order to reduce clutter in noisy digital recordings. A threshold of -3 dB or -6 dB reduces the input signal level to eliminate background clutter. Use a threshold of 0 dB regularly, and select signal reduction only if necessary to reduce clutter. e. Color Palette Selection And finally, you have a choice of color of grayscale display. For a color display, red represents the highest signals and dark blue the lowest. For a grayscale display, the darker the display, the higher the signal level. Once you are satisfied with the Analysis Options, click "OK" to begin processing and display of a spectrogram of the audio data file. The program will step sequentially through the audio file, calculate an FFT at each step, and display the results in the Spectrogram window. You can stop the process at any time by clicking the "Stop" button. 4. The Spectrogram Display The spectrogram display reveals the digital signal as a frequency versus time plot with signal amplitude at each frequency represented by intensity (or color). A continuous readout of cursor position in frequency (Hz) and time (milliseconds) is displayed at the bottom left of the window. A coordinate grid can also be added by clicking the "Toggle Grid" button. The width of the spectrogram display is limited only by the display screen. Maximizing the spectrogram window will expand the display horizontally to fill the screen. If the spectrogram width is greater than screen width, you can use the horizontal scroll bar at the bottom of the display to position the spectrogram side-to-side. The height of the spectrogram display is limited by the size of the FFT chosen for analysis. Only 256 vertical display points are needed for a 512 point FFT. The 1024 and 2048 point FFTs require 512 and 1024 points respectively. Maximizing the spectrogram window will expand the display vertically to the size required by the FFT if not limited by the screen height. If the spectrogram height is greater than the screen height, use the vertical scroll bar at right of the window to position the spectrogram top-to-bottom. 5. Modifying Spectrograms Once you have computed a spectrogram, you may want to make changes to its length, vertical or horizontal scale, threshold or color to improve the frequency analysis. The menu bar across the top of the display gives options for FFT size, horizontal scale, display threshold, and color palette. Choosing any of these options will cause the spectrogram to be recomputed with the new value you have chosen. If you want to change more than one parameter before recomputing the spectrogram, choose "Modify" from the File Menu to bring up another Analysis Options dialog box to make your selections. Frequently you will want to select a portion of the entire spectrogram for recomputation rather than recompute the entire length. You can drag select this section from the spectrogram display. Position the mouse pointer at the desired starting point, press the left mouse button and drag the mouse to the desired ending point and then release the mouse button. The Analysis Options dialog box will then appear with the starting and ending locations filled according to your selection. 6. Direct Recording and Analysis If you have a Windows compatible sound card installed, you will be able to directly record and analyze an audio sample through a microphone attached to your sound card. Choose "Record New" from the File menu to initiate recording. You will be again be presented with the Analysis Options dialog box to select the parameters of the frequency analysis. When recording is complete, computation of the spectrogram will begin. 7. Spectrogram Playback If you have a windows compatible sound card installed, you will also be able to play back the spectrogram by clicking the 'Play' or 'Play Wdw' buttons. The Play button plays back the entire length of the .wav file, while the Play Wdw button plays back only that portion of the spectrogram which is visible in the Spectrogram Window. 8. Saving Audio and Bitmap Files You can save a .wav file of the digital audio of your spectrogram by choosing "Save Wave" from the File Menu. You can also save a bitmap of the visible portion of the Spectrogram Window by choosing "Save Bitmap" from the File Menu. 9. Problem Reporting Programs can only be improved if users provide feedback to the author. I can be reached at the following addresses for you to report any bugs or to provide comments or feedback. I encourage anyone with a question to contact me at : DELPHI - RSHORNE INTERNET - RSHORNE@DELPHI.COM 10. DISTRIBUTION Spectrogram is Copyright 1994 by R.S. Horne and may be distributed as freeware. 11. CREDITS So many interested Internet users have given good comments and suggestions that I can't list everyone. However, the contribution of Philip VanBaren who provided the fast integer FFT code, has been vital to the improved performance of this update. Greg Walker and Henrik Clausen provided invaluable suggestions and debugging help.