ACHIEVING INEXPENSIVE DIGITAL AUDIO ON PCS FOR EDUCATIONAL PURPOSES David T. Chappell Department of Computer Science North Carolina State University Raleigh, NC 27695 INTRODUCTION In the past few years, the computer industry has been slowly gaining interest in computer-generated speech and sound. In this area, the Apple Macintosh and Commodore Amiga provide built-in sound control so that their machines can play digitized recordings with little effort from the programmer. IBM, however, has chosen not to include advanced sound capabilities in its line of personal computers. The market has shown that relatively few people will buy speech or sound add-ons from either IBM or third parties. Until IBM makes such hardware standard, speech software will encounter great difficulty in gaining acceptance. It is in this regard, however, that mathematics, engineering, and software can come to the rescue: it is possible for a PC to play good quality sound without additional hardware. BACKGROUND Digitized sounds consist of the numerical values for the amplitude at regular time intervals. [See figure 1.] Along with the amplitude data, the number of samples taken per second must be recorded. By reproducing the amplitude changes at the original rate, the sound can then be played back. Much research has gone into digitized sound, and it is now possible to produce digitized recordings which, to the human ear, are identical to their analog counterparts. Once recorded, digital signals can be stored in a variety of formats. Pulse amplitude modulation (PAM) is the standard method of representing analog data. [See figure 2.] In this method, each piece of data represents the amplitude at one instant in time. Pulse width modulation (PWM) treats each piece of information as the duration of a pulse which starts at a regular frequency. [See figure 3.] Pulse code modulation (PCM) is the most common in digital audio recording. [See figure 4.] In PCM, each bit records whether the amplitude is high or low at each point in time. [1] Several brands of personal computers include dedicated sound chips. The Macintosh, Amiga, and other computers can easily produce high-quality sound by using dedicated hardware. On these computers, many programs are enhanced by the addition of sound and speech. Likewise, IBM and a multitude of other companies have produced boards that allow PCs to record and playback digitized sound. A lack of standards, however, limits the use of these boards, and few commercial programs use them. Over the years, several attempts have been made to play digitized sound on PCs without special hardware. A number of these have appeared in the public domain. Commercial software has had greater success, but most of these programs produce rough-sounding speech. With the recent rise of interest in sound and pictures, a few developers have produced good, intelligible speech and sounds. The work presented here surpasses past attempts and adds to that of other successes. MATERIALS AND METHODS The Intel 8253 programmable interval timer, found on all IBM PCs and compatibles, is a flexible counter chip. It has three 16-bit channels. Each channel produces an output signal based on an input signal and an arbitrary 16-bit number. The chip's six modes can produce varying types of output. Table 1 summarizes the six modes. Refer to Rosch [2] or Sargent and Shoemaker [3] for more details. Table 1: Intel 8253 Operating Modes 0 - interrupt on terminal count 1 - programmable one-shot 2 - rate generator 3 - square-wave generator 4 - software-triggered strobe 5 - hardware-triggered strobe On IBM PCs, the 8253's first channel increments the time-of- day clock, the second channel refreshes the DRAMs, and the third channel sends sound to the speaker. The timer/counter of AT class machines is based on the Intel 8254-2 chip and is functionally equivalent to the 8253. The PS/2 line has an 8253 that is used the same way, except that a separate chip refreshes memory, and the second channel of the 8253 is used for diagnostics or is unassigned. [2] The 8253 can only give two possible output states: 0 and 1. Possibly because of this limitation, the chip's use for sound production in PCs has typically been limited to being a square- wave generator (mode 2). When functioning as a square-wave generator, the chip's output is equal to its input frequency (1.193 MHz on PCs) divided by a 16-bit number that is input to the chip by the programmer. The output is a square wave whose high and low periods are equal. The range of the chip falls between 18.2 hertz and 1.193 megahertz [4]. Mode 2 is used by channel 0 for the time-of-day clock and by channel 2 Despite the two-state output limitation, the 8253 can also play digitized sound. When put in mode 0, the output is low for the duration of a number of input pulses equal to a programmed 16-bit number, after which the output goes high. The net result is that the resulting wave is in the form of PWM. [See figure 5.] To program PWM sound output on the 8253, several steps must be taken. First, the programmable peripheral interface (PPI) must be initialized to the desired mode. Then the 8253 must be initialized before the sound data can be sent to it. Table 2 summarizes chip ports. Table 2: Chip I/O Port Addresses Port Dec Hex PPI chip 97 61 8253 Channel 0 64 40 8253 Channel 1 65 41 8253 Channel 2 66 42 8253 Control Word 67 43 The PPI must be initialized to allow the 8253 to control the speaker. The two lowest bits of port 97 must be turned on. The other bits of this port should remain untouched since they are used for other purposes. [4] Port 67 is the control word register which initializes the chip. Each counter is initialized by sending this port one control byte. Table 3 shows the meanings of the bits in the control word. Table 3: Meaning of Control Word Register bit 0 = 0, count in binary = 1, count in Binary Coded Decimal bits 1-3 = mode number (0 to 5 in binary) bits 4,5 = 00, latch current count for reading = 01, read/load low byte = 10, read/load high byte = 11, read/load low byte, then high byte bits 6,7 = counter number (0 to 2 in binary) (Bit 0 is least significant; 7 is most significant) When reading both bytes of the 16-bit value, a latch command prevents the count from changing between reading the high byte and the low byte. Latching is not needed when reading only a single byte. For example to set the chip to generate musical tones in mode 3 the control word is 182 (B6 hex). For digital audio via mode 0, use 176 (B0 hex). Ports 64, 65, and 66 are used to read and write to timers 0, 1, and 2 respectively. Data sent to these ports becomes the 16- bit number used to affect output. If only one byte is sent, the other byte retains its previous value. Listing 1 shows the general algorithm for 8-bit digital audio output with the 8253. 8-bit quality is achieved by leaving the high byte constant and sending data only to the low byte. Listing 1: Algorithm for Digital Audio Load Digital Audio Data; value = InPort(61h); -- Initialize PPI OutPort(61h, value OR 3); OutPort(43h,B0h); -- Initialize 8253 OutPort(41h,00h); OutPort(41h,00h); OutPort(43h,90h); loop until end of data -- Play sound OutPort(41h,Data); Wait Until Data Passes; OutPort(43h,B6h); -- Restore 8253 Listing 2 gives an example program written for Turbo C. Note that the user must specify whether the input data is signed. Since the playback method using the 8253 must use unsigned data, a scaling factor of 128 must be added to all signed data. Listing 2: Turbo C Code for Digital Audio #include #include #include #include #include /* SOUND.C Author: David Chappell Version: 1.46c Date: 24 June 1990 Method: 8253 PWM method */ FILE *soundfile; /* input data file */ unsigned long size; /* size of input file */ int wait; /* time to wait between sending samples out */ unsigned char offset; /* change signed samples to unsigned */ void error(char message[]); void playfile(void); void startspeaker(void); void openfile(void); void stopsound(void); void main(void); void error(char message[]) /* Purpose: handles errors */ { fprintf(stderr,"\nERROR: %s\n",message); exit(-1); } void playfile(void) /* Purpose: loads file and plays digitized sound */ { unsigned int count, pause; char curr; char *inputbuffer; if ((inputbuffer=(char*) calloc(size,sizeof(char))) == NULL) error("Not enough memory to load file"); fread(inputbuffer, size, 1, soundfile); disable(); /* disable interrupts */ for (count = 0; count < size; count++) { curr = *(inputbuffer+count) + offset; output(66,curr); for (pause = 0; pause < wait; pause++); } enable(); /* enable interrupts */ } void startspeaker(void) /* Purpose: initialize speaker for output */ { outp(97,inp(97) | 3); /* set PPI */ outp(67,176); /* send initial data to timer */ outp(66,00); outp(66,00); outp(67,144); /* prepare timer chip to receive data */ } void openfile(void) /* Purpose: opens input file */ { char choice; /* key hit by user */ clrscr(); puts("What file do you want to hear?"); if ((soundfile = fopen(gets(NULL),"rb")) == NULL) error("Unable to open sound file"); fseek(soundfile,0,SEEK_SET); size = filelength(fileno(soundfile)); printf("\nFile size = %lu bytes\n\n",size); printf("What delay time do you want in FOR counter? "); scanf("%d",&wait); printf("Is the data signed? "); choice=getche(); if ((choice=='Y') || (choice=='y')) offset=128; } void stopsound(void) /* Purpose: resets speaker to stop sound */ { outp(67,182); /* restore timer to mode 3 */ outp(66,51); /* set channel 3 to power-on value */ outp(66,05); nosound(); fclose(soundfile); } void main(void) { openfile(); startspeaker(); playfile(); stopsound(); } RESULTS The method described thus far has several limitations when put into practice on PCs. The 16-bit quality of the chip reduces to 7 bits at most sample rates. Also, a background tone is produced along with the desired sound because of the use of PWM. Both of these difficulties arise from timing problems. The method described thus far has the capability of yielding high-quality 16-bit sound. The code given, however, can only produce approximately 7-bit sound. Although the 8253 has the ability to play 16-bit data, the timing limitations of the PC restrict the length of each pulse. In order to produce sound at the rate of about 8-13 kHz, only about six or seven bits of data are processed before the next piece of data must begin output. At a slower sample rate of about 4-7 kHz, seven to eight bits of accuracy can be achieved. A higher input frequency would resolve this difficulty; however, this hardware problem can not be easily accomplished in PCs but would be feasible in other applications of the 8253. The maximum data size (volume) possible can be calculated mathematically. The 8253 input rate divided by the output sample rate yields the number of time periods that pass before the next sample begins play: Maximum value = 1.193 MHz / sample rate For example, the maximum volume for an 8 kHz sample is 147. The 1.193 MHz input frequency that feeds the 8253 limits the chips sound capabilities. As an annoying side-effect, the provided algorithm creates a background tone. Due to the nature of PWM, at the beginning of each piece of data, the output goes from low to high. [See figure 4.] This periodic oscillation produces a pitch equal to the frequency at which the sound is played. For example, an 8 kHz sample will produce a background tone of 8 kHz. The resulting tone overlays the digitized sound output. A pitch of 18 kHz or greater is high enough that the human ear can not detect it. Thus, any sample of this frequency will not produce an audible background tone. If a given sample is not of high enough frequency, this problem can be alleviated by outputting each piece of data multiple times in rapid succession so that the background tone is of such a high frequency that it is inaudible. For example, by playing each datum of an 8 kHz sample three times, the resulting pitch will be 24 kHz. The first problem, however, becomes dramatic when a moderate-speed sample is sent repeatedly: in order to maintain the original sampling rate, the 8253 has time to process fewer and fewer bits for each datum. DISCUSSION The development of this digital audio playback method has several implications and possibilities. A variety of applications, from games to word processors, can use voice and sound. On multi-tasking operating systems, sounds can easily be played in the background. Other computers could use the same ideas for audio output. When combined with extra hardware, this method can form a complete audio I/O system. When a PC acts as a terminal, this procedure will allow mainframes and minicomputers to play digitized sound. Speech interaction is currently put to several uses, especially to help disabled users. For example, IBM's SpeechViewer helps deaf children and adults improve pronunciation. A microphone and speech recognition allow the personal computer to understand the user's voice. For blind users, IBM's ScreenReader program can vocally relate the text that appears on the monitor. Theoretical physicist Stephen Hawking uses a computer to talk despite his crippling disease. There are numerous other instances of disabled users benefitting from talking computers. There are also many computer programs that help people learn to read and write. Across the nation, children can listen to the computer talk as they use IBM's Writing to Read software. Illiterate adults gain an invaluable skill as a computer speaks and displays words on the screen. In addition, many musicians use computers to produce and mix sounds. Computers can now produce rich tones that rival musical instruments. Musical synthesizers are actually specialized computers built for the purpose of producing sound. With the aid of digital signal processor boards, personal computers can make music equal in quality to the better synthesizers. By reducing the need for extra hardware, digitized sound can easily be added to other programs. As an obvious example, games can use sound for both special effects and general entertainment. As multimedia becomes more popular, sound becomes a necessity. Useful applications, from word processors to spreadsheets, can speak to help visually impaired users. For example, allowing speech output from a word processor would do wonders to assist visually impaired writers and programmers. Adding speech to personal computers would benefit new users. Inexperienced users would find a computer to be much friendlier if it could speak to them. By providing verbal output in important areas such as error handling, the computer can help new and disabled users. A user interface that includes speech can help bring computers to the level of interaction that humans use with each other. Thus, nearly all types of software can benefit from the addition of speech and sound capabilities. When running under PC-DOS, only one program can be run at a time. The only way to allow the computer to play recorded sounds while continuing other work is to modify interrupts. Under multi-tasking operating systems, such as OS/2 and Unix, the 8253 could be continually fed data in a background task while the main program continues. Playing sound in the background gives more flexibility. For example, a communications package could verbally report an error while continuing to receive data, or a demonstration could play music while displaying graphics. The data storage method used here is compatible with many others. A huge number of digitized samples are available from Macintosh, Amiga, and Atari ST computers. Several PC expansion boards also use the same storage method. Data recorded on any of this hardware can be played back on an ordinary PC. Furthermore, by purchasing an available expansion board or building one, sound recording is possible on PCs. As the use of speech technology grows, speech can be added to other products. According to IBM's long-range plan, all host computers will eventually be accessed via a PS/2 running OS/2. Although mainframes and minicomputers do not typically have sound speakers, they could use the PS/2's speaker for speech output. Thus, by using PCs as terminals, the full range of computers can handle digital audio. The algorithm presented here can be used in settings other than in a PC. The same method could be used in any computer with an 8253 chip, and a hardware expansion using the 8253 can be added to other computers. More importantly, any hardware configuration capable of producing output similar to PWM can, when connected to a speaker, produce digitized sound. Similarly, any system able to produce pulses similar to any digital recording method can output digitized sound. As a result, hardware with only two states can play sound, and a digital-to- analog converter is not needed. Most computers, such as the Amiga, use D/A converters, but this method shows that such is not required to play digitized sound. Building an audio system based around the 8253 would be an excellent project for hardware students. An expansion board that used a 8253 to play digitized sound (in place of a D/A converter) would be possible for many microcomputers. A small, self- contained system created specifically for the 8253 would also be a good project. In these cases, the student could design the hardware so that it does not have the limitations present in PCs. CONCLUSION Over the past several decades, engineers have searched for ways to make computers both talk and play high-quality music. One solution to both problems, digitization of sound using pulse modulation, requires little processing time and is thus appropriate for microcomputers. Although previous usage of digitized sound has been limited to computers with specialized hardware, it is possible for a standard PC to play good quality sounds without extra hardware. As the computer world strides deeper into sound-based applications despite a lack of hardware standards for sound output on PCs, this method may prove to be invaluable in bringing sound to the masses. As people learn more from computers, their experiences will benefit from the addition of audio. Students and teachers can use this knowledge as they learn about computers and sound. With minimal effort, any program can add a new dimension with speech, music, and sound effects. REFERENCES [1] Pohlmann, Ken C. Principles of Digital Audio. H. W. Sams, Indianapolis (1985). [2] Rosch, Winn L. The Winn Rosch Hardware Bible. Simon & Schuster, New York (1988). [3] Sargent, Murray, III and Richard L. Shoemaker. The IBM Personal Computer from the Inside Out. Addison-Wesley (1984). [4] Norton, Peter. The Peter Norton Programmer's Guide to the IBM PC. 1st ed. Microsoft, Redmond, WA (1985). Copyright 1990 by the Consortium for Computing in Small Colleges. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the CCSC copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Consortium for Computing in Small Colleges. To copy otherwise requires a fee and/or specific permission. This article appeared in "Proceeding of the Fourth Annual Southeastern Small College Computing Conference", November 9-10, 1990. Reprinted with permission of the CCSC.