SOFTWARE-BASED DIGITAL AUDIO ON PCs by David T. Chappell Copyright (C) 1991 ACM. All Rights Reserved This article originally appeared in the Proceedings of the 1991 ACM Computer Science Conference, March 1991. Copying is by permission of the Association for Computing Machinery. ABSTRACT Digital audio techniques were investigated with special concentration on computer applications. Experimentation showed that the Intel 8253 programmable interval timer, found on all IBM PCs and compatibles, can output digitized sound. The relationship between pulse amplitude modulation and pulse width modulation signals was found to be significant to this process. Feeding digitized sound data to the chip results in an inherent transfer from pulse amplitude modulation to pulse width modulation encoding. The relationship in signal modulation allows IBM PCs to play digitized sound without the use of extra hardware. Biography: David T. Chappell is a senior in the computer science curriculum at North Carolina State University. In addition, he participates in the co-op program and works for IBM. His interests are in the areas of advanced input/output, technological research, and scientific applications. David plans to attend graduate school in computer science or a related field. INTRODUCTION In the past few years, the computer industry has been slowly gaining interest in computer-generated speech and sound. In this area, the Apple Macintosh and Commodore Amiga provide built-in sound control hardware so that their machines can play digitized recordings with little effort from the programmer. IBM, however, has chosen not to include advanced sound capabilities in its line of personal computers. The market has shown that relatively few people will buy speech or sound add-ons from either IBM or third parties. Until such hardware is standardized, sound software will encounter great difficulty in gaining acceptance. It is in this regard, however, that mathematics, engineering, and software can come to the rescue: it is possible for a PC to play good quality sound without additional hardware. BACKGROUND Sound is transmitted through the air as a longitudinal wave. The source of the sound compresses the air in one area and this air compresses the air next to it while it moves back to its original position. As each air molecule is displaced from and returns to its normal position, the wave travels through the air. The movement that results from these repeated compressions and rarefactions is called the propagation of the wave [1]. It is often convenient to use graphs to visually represent sound. Most pictures show a graph of the wave amplitude vs. time, where the amplitude represents the displacement of molecules from their original positions. [See figure 1.] Regular waves, such as sine waves, need not be depicted in such a manner but can be represented by just a frequency (or wavelength), amplitude (volume), and duration. For example, the SOUND statement in Microsoft BASIC has the parameters of frequency and duration. (Volume is not needed because the PC has only one volume.) The irregular waves that make up most of the sound we hear are more complicated and need to be represented by giving specific details on the amplitude at each point in time. Digitized sounds consist of the numerical values for the amplitude at regular time intervals. [See figure 2.] Along with the amplitude data, the number of samples taken per second must be recorded. By reproducing the amplitude changes at the original rate, the sound can then be played back. Much research has gone into digitized sound, and it is now possible to produce digitized recordings which, to the human ear, are identical to their analog counterparts. Once recorded, digital signals can be stored in a variety of formats. Pulse amplitude modulation (PAM) is the standard method of representing analog data. [See figure 3.] In this method, each piece of data represents the amplitude at one instant in time. Pulse width modulation (PWM) treats each piece of information as the duration of a pulse which starts at a regular frequency. [See figure 4.] In pulse code modulation (PCM), each bit records whether the amplitude is high or low at each point in time. [See figure 5.] PCM is the most common method in digital audio recording and is used by audio compact discs to achieve high-quality sound. Pulse position modulation (PPM) records the position of a brief pulse by giving the time duration before the pulse occurs [1]. [See figure 6.] Several brands of personal computers include dedicated sound chips. The Macintosh, Amiga, Atari ST, and other computers can easily produce high-quality sound by using dedicated hardware. On these computers, many programs are enhanced by the addition of sound and speech. Likewise, IBM and a multitude of other companies have produced boards that allow PCs to record and playback digitized sound. A lack of standards, however, limits the use of these PC boards, and few programs use them. Over the years, several attempts have been made to play digitized sound on PCs without special hardware. A number of these have appeared in the public domain, but few are coherent. Commercial software has had greater success, but most of these programs produce rough speech. With the recent rise of interest in audio and video, a few developers have even produced good, intelligible speech and sounds. The work presented here rivals that of the best public domain and commercial successes. MATERIALS AND METHODS The Intel 8253 programmable interval timer, found on all IBM PCs and compatibles, is a flexible counter chip. It has three 16-bit channels. Each channel produces an output signal based on an input signal and a programmed 16-bit number. The chip's six modes can produce varying types of output. Table 1 summarizes the six modes. Refer to Rosch [2] or Sargent and Shoemaker [3] for more details. Table 1: Intel 8253 Operating Modes 0 - interrupt on terminal count 1 - programmable one-shot 2 - rate generator 3 - square-wave generator 4 - software-triggered strobe 5 - hardware-triggered strobe On IBM PCs, the 8253's first channel increments the time-of-day clock, the second channel refreshes the DRAMs, and the third channel sends sound to the speaker. The timer/counter of AT class machines is based on the Intel 8254-2 chip and is functionally equivalent to the 8253. The PS/2 line has an 8253 that is used the same way, except that a separate chip refreshes memory, and the second channel of the 8253 is used for diagnostics or is unassigned [2]. The 8253 can only give two possible output states: 0 and 1. Possibly because of this limitation, the chip's use for sound production in PCs has typically been limited to being a square-wave generator (mode 2). When functioning as a square-wave generator, the chip's output is equal to its input frequency (1.193 MHz on PCs) divided by a 16-bit number that is input to the chip by the programmer. The output is a square wave whose high and low periods are equal. The range of the chip falls between 18.2 hertz and 1.193 megahertz [4]. Mode 2 is used by channel 0 for the time-of-day clock and by channel 2 to produce the beeps and whistles found in many DOS programs. Despite the two-state output limitation, the 8253 can also play digitized sound. When put in mode 0, the output defaults to high. When a 16-bit number is then programmed into the chip, the output goes low for the duration of the specified number of input pulses, after which the output returns high. The net result is that the output wave is in the form of PWM. [See figure 7.] To program PWM sound output on the 8253, several steps must be taken. First, the programmable peripheral interface (PPI) chip must be initialized to the desired mode. Then the 8253 must be initialized and data sent to it. Table 2 summarizes chip ports. Table 2: Chip I/O Port Addresses Port Dec Hex PPI chip 97 61 8253 Channel 0 64 40 8253 Channel 1 65 41 8253 Channel 2 66 42 8253 Control Word 67 43 To allow the 8253 to control the speaker, the PPI must be set correctly. The two lowest bits of port 97 must be turned on. The other bits of this port should remain untouched since they are used for other purposes [4]. Port 67 is the control word register which initializes the 8253. Each counter is initialized by sending this port one control byte. Table 3 shows the meanings of the bits in the control word. Table 3: Meaning of 8253 Control Word Register bit 0 = 0, count in binary = 1, count in Binary Coded Decimal bits 1-3 = mode number (0 to 5 in binary) bits 4,5 = 00, latch current count for reading = 01, read/load low byte = 10, read/load high byte = 11, read/load low byte, then high byte bits 6,7 = counter number (0 to 2 in binary) (Bit 0 is least significant; 7 is most significant) When reading both bytes of the 16-bit value, a latch command prevents the count from changing between reading the high byte and the low byte. Latching is not needed when reading only a single byte. For example to set the chip to generate musical tones in mode 3 the control word is 182 (B6 hex). For digital audio via mode 0, use 176 (B0 hex). Ports 64, 65, and 66 are used to read and write to timers 0, 1, and 2 respectively. Data sent to these ports becomes the 16-bit number used to affect output. If only one byte is sent, the other byte retains its previous value. Listing 1 shows the general algorithm for 8-bit digital audio output with the 8253. 8-bit quality is achieved by leaving the high byte constant and sending data only to the low byte. Listing 1: Algorithm for Digital Audio Load Digital Audio Data; value = InPort(61h); -- Initialize PPI OutPort(61h, value OR 3); OutPort(43h,B0h); -- Initialize 8253 OutPort(41h,00h); OutPort(41h,00h); OutPort(43h,90h); loop until end of data -- Play sound OutPort(41h,Data); Wait Until Data Passes; OutPort(43h,B6h); -- Restore 8253 Listing 2 gives an example program written for Turbo C. In the sample program, the user must specify whether the input data is signed. Some forms of digital audio storage, including PCM, inherently use unsigned variables. Other forms, including PWM and PAM, can be stored as signed variables, and notably the Amiga microcomputer stores digitized sound data on a scale from -128 to 127. Since the playback method using the 8253 must use unsigned data, a scaling factor of 128 must be added to all signed data [5]. Listing 2: Turbo C Code for Digital Audio #include #include #include #include #include /* SOUND.C Author: David Chappell Version: 1.46d Date: 24 June 1990 Method: 8253 PWM method */ FILE *soundfile; /* input data file */ unsigned long size; /* size of input file */ int wait, /* time to wait between sending samples out */ unsigned char offset, /* change signed samples to unsigned */ vol1, vol2; /* adjusts range of data */ void error(char message[]) /* Purpose: handles errors */ { fprintf(stderr,"\nERROR: %s\n",message); exit(-1); } void playfile(void) /* Purpose: loads file and plays digitized sound */ { int pause; unsigned int count, temp; unsigned char min, max, data; char *inputbuffer; if ((inputbuffer=(char*) calloc(size,sizeof(char))) == NULL) error("Not enough memory to load file"); fread(inputbuffer, size, 1, soundfile); /* scale data */ min=255; max=0; for (count = 0; count < size; count++) { data = *(inputbuffer + count)+offset; if (datamax) max=data; } vol1 = 64; vol2 = max-min+1; /* scale from 0 to 64 */ offset -= min; /* move lowest point to zero */ disable(); /* disable interrupts */ for (count = 0; count < size; count++) { data = *(inputbuffer + count) + offset; temp = data * vol1 / vol2; data = temp + 1; output(66,data); for (pause = 0; pause < wait; pause++); } enable(); /* enable interrupts */ } void startspeaker(void) /* Purpose: intitialize speaker for output */ { outp(97,inp(97) | 3); /* set PPI */ outp(67,176); /* send initial data to timer */ outp(66,00); outp(66,00); outp(67,144); /* prepare timer chip to receive data */ } void openfile(void) /* Purpose: gets information from user and opens input file */ { char choice; /* key hit by user */ clrscr(); /* clear screen */ puts("What file do you want to hear?"); if ((soundfile = fopen(gets(NULL),"rb")) == NULL) error("Unable to open sound file"); fseek(soundfile,0,SEEK_SET); size = filelength(fileno(soundfile)); printf("\nFile size = %lu bytes\n\n",size); printf("What delay time do you want in FOR counter? "); scanf("%d",&wait); printf("Is the data signed? "); choice=getche(); if ((choice=='Y') || (choice=='y')) offset=128; } void stopsound(void) /* Purpose: resets speaker to stop sound */ { outp(67,182); /* restore timer to mode 3 */ fclose(soundfile); } void main(void) { openfile(); startspeaker(); playfile(); stopsound(); } RESULTS The method described thus far has several limitations when put into practice on PCs. The 16-bit quality of the chip reduces to 7 bits at most sample rates. Also, a background tone is produced along with the desired sound because of the use of PWM. Both of these difficulties arise from timing problems. The method described thus far has the capability of yielding high-quality 16-bit sound. The code given, however, can only produce approximately 7-bit sound. Although the 8253 has the ability to play 16-bit data, the timing limitations of the PC restrict the length of each pulse. In order to produce sound at the rate of about 8-13 kHz, only about six or seven bits of data are processed before the next piece of data must begin output. At a slower sample rate of about 4-7 kHz, seven to eight bits of accuracy can be achieved. A higher input frequency would resolve this difficulty; however, the frequency can not be changed in PCs but could be modified in other applications of the 8253. The maximum data size (volume) possible can be calculated mathematically. The 8253 input rate divided by the output sample rate yields the number of time periods that pass before the next sample begins play: Maximum value = 1.193 MHz / sample rate For example, the maximum volume for an 8 kHz sample is 147. The 1.193 MHz input frequency that feeds the 8253 limits the chips sound capabilities. As an annoying side-effect, the provided algorithm creates a background tone. Due to the nature of PWM, at the beginning of each piece of data, the output changes state as it goes from high to low. [See figure 7.] This periodic oscillation produces a pitch equal to the frequency at which the sound is played. For example, an 8 kHz sample will produce a background tone of 8 kHz. The resulting tone overlays the digitized sound output. The human ear can not detect pitches beyond 22 kHz, but most people can not hear a pitch of 18 kHz or greater frequency. Thus, any sample of this frequency will not produce an audible background tone. If a given sample is not of high enough frequency, this problem can be alleviated by playing each piece of data multiple times in rapid succession so that the background tone is of such a high frequency that it is inaudible. For example, by playing each datum of a 9.5 kHz sample twice, the resulting pitch will be 19 kHz. The first problem, however, becomes dramatic when a moderate-speed sample is sent repeatedly: in order to maintain the original sampling rate, the 8253 has time to process fewer and fewer bits for each datum [5]. DISCUSSION Several other methods were attempted before the above method was discovered. Although all of these other methods yield sound of varying quality, most do not produce recognizable results, and none match the PWM method. With some work, any of the following techniques might be able to give better output. One idea is to look at each piece of data, compare it to the midpoint, and set the speaker bit appropriately. For example, the midpoint for 8-bit data is alternately 0 or 128 (depending on whether a sign bit is used), so any values above 128 (or 0) could be set as 1 and all other values set as 0. Modifying bit 3 of I/O port 97 (61 hex) changes the state of the speaker's position. This idea could never yield very good results, however, as it gives only 1-bit accuracy. A heretofore ineffective method is based on PCM encoding. Each individual bit of input directly determines the state of the speaker. Manipulating bit 3 of port 97 yields the two states. Another possibility is to use mode 3 of the 8253 to play a pitch which is either directly or inversely proportional to the data. Although this method yields audible sound, the results are not as good as those of the main PWM method. A final idea is to use the 8253 based on pulse position modulation. Mode 4 of the 8253 should produce PPM output, but trials so far have given negative results. The given algorithm can be further improved by allowing the program to automatically determine the speed of the computer. In the current method, a delay time must be entered manually. By incrementing a counter while watching the system clock, a ratio of instructions to time can be calculated. Under PC-DOS and OS/2, the 18.2 Hz frequency of the system clock limits the accuracy of this idea, however, unless channel 0 of the 8253 is used to change the system timer. A better timing method would be to use an algorithm that does not rely on loops for timing delays. Under PC-DOS, interrupts can be modified so that data is played after each clock tick. By increasing the frequency of clock ticks and changing the timer interrupt, the computer can output each piece of data at a regular interval. Other operating systems, however, do not allow such interrupt modification. The development of this digital audio playback method has several implications and possibilities. A variety of applications, from games to word processors, can use voice and sound. On multi-tasking operating systems, sounds can easily be played in the background. Other computers could use the same ideas for audio output. When combined with extra hardware, this method can form a complete audio I/O system. When a PC acts as a terminal, this procedure will allow mainframes and minicomputers to play digitized sound. Speech interaction is currently put to several uses. For example, IBM's SpeechViewer helps deaf children and adults improve pronunciation. In addition, several programs use spoken words to help people learn to read and write. For blind users, IBM's ScreenReader program can vocally relate the text that appears on the monitor. There are numerous instances of disabled users benefitting from talking computers. In addition, many musicians use computers to produce and mix sounds. Computers can now produce rich tones that rival musical instruments. By reducing the need for extra hardware, digitized sound can easily be added to other programs. As an obvious example, games can use sound for both special effects and general entertainment. Useful applications, from word processors to spreadsheets, can speak to help visually impaired users. Inexperienced users would find a computer to be much friendlier if it could speak to them. A user interface that includes speech can help bring computers to the level of interaction that humans use with each other. Thus, nearly all types of software can benefit from the addition of speech and sound capabilities [5]. When running under PC-DOS, only one program can be run at a time. The only way to allow the computer to play recorded sounds while continuing other work is to modify interrupts, as mentioned above. Under multi-tasking operating systems, such as OS/2 and Unix, the 8253 could be continually fed data in a background task while the main program continues. Playing sound in the background gives more flexibility. For example, a communications package could verbally report an error while continuing to receive data, or a graphical demo could play music in the background while displaying picture on the monitor. The data storage method used here is compatible with many others. A huge number of digitized samples are available from Macintosh, Amiga, and Atari ST computers. Several PC expansion boards also use the same storage method. Data recorded on any of this hardware can be played back on an ordinary PC. Whether the original sample is in the form of PAM, PCM, or PWM, it can play through the PC speaker. Furthermore, by purchasing an available expansion board or building one, sound recording is possible on PCs. As the use of speech technology grows, speech can be added to larger machines. According to IBM's long-range plan, all host computers will eventually be accessed via a PS/2 running OS/2. Although mainframes and minicomputers do not typically have sound capabilities, they could be use the PS/2's speaker for speech output. Thus, by using PCs as terminals, the full range of computers can handle digital audio. The algorithm presented here can be used in settings other than in a PC. The same method could be used in any computer with an 8253 chip, and a hardware expansion using the 8253 can be added to other computers. More importantly, any hardware configuration capable of producing output similar to PWM can, when connected to a speaker, produce digitized sound. Similarly, any system able to produce pulses similar to any digital recording method can output digitized sound. As a result, hardware with only two output states can play sound, and a digital-to-analog converter, such as found in the Amiga, is not required. CONCLUSION Over the past several decades, engineers have searched for ways to make computers both talk and play high-quality music. One solution to both problems, digitization of sound using pulse modulation, requires little processing time and is thus appropriate for microcomputers. Although previous usage of digitized sound has been limited to computers with specialized hardware, it is possible for a standard PC to play good quality sounds without extra hardware. As the computer world strides deeper into multimedia and other sound- based applications despite a lack of hardware standards for sound output on PCs, this method may prove to be invaluable in bringing sound to the masses. With minimal effort, any program can add a new dimension with speech, music, and sound effects. REFERENCES [1] Pohlmann, Ken C. Principles of Digital Audio. H. W. Sams, Indianapolis (1985). [3] Rosch, Winn L. The Winn Rosch Hardware Bible. Simon & Schuster, New York (1988). [5] Sargent, Murray, III and Richard L. Shoemaker. The IBM Personal Computer from the Inside Out. Addison-Wesley (1984). [2] Norton, Peter. The Peter Norton Programmer's Guide to the IBM PC. 1st ed. Microsoft, Redmond, WA (1985). [6] Chappell, David T. "Achieving Inexpensive Digital Audio on PCs for Educational Purposes". Proceedings of the Southeastern Small College Computing Conference. (1990). Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 1 17