ACHIEVING INEXPENSIVE DIGITAL AUDIO ON PCS
                    FOR EDUCATIONAL PURPOSES


                       David T. Chappell
                 Department of Computer Science
                 North Carolina State University
                       Raleigh, NC  27695


                          INTRODUCTION

     In the past few years, the computer industry has been slowly
gaining interest in computer-generated speech and sound.  In this
area, the Apple Macintosh and Commodore Amiga provide built-in
sound control so that their machines can play digitized
recordings with little effort from the programmer.  IBM, however,
has chosen not to include advanced sound capabilities in its line
of personal computers.  The market has shown that relatively few
people will buy speech or sound add-ons from either IBM or third
parties.  Until IBM makes such hardware standard, speech software
will encounter great difficulty in gaining acceptance.  It is in
this regard, however, that mathematics, engineering, and software
can come to the rescue:  it is possible for a PC to play good
quality sound without additional hardware.


                           BACKGROUND

     Digitized sounds consist of the numerical values for the
amplitude at regular time intervals.  [See figure 1.]  Along with
the amplitude data, the number of samples taken per second must
be recorded.  By reproducing the amplitude changes at the
original rate, the sound can then be played back.  Much research
has gone into digitized sound, and it is now possible to produce
digitized recordings which, to the human ear, are identical to
their analog counterparts.
     Once recorded, digital signals can be stored in a variety of
formats.  Pulse amplitude modulation (PAM) is the standard method
of representing analog data.  [See figure 2.]  In this method,
each piece of data represents the amplitude at one instant in
time.  Pulse width modulation (PWM) treats each piece of
information as the duration of a pulse which starts at a regular
frequency.  [See figure 3.]  Pulse code modulation (PCM) is the
most common in digital audio recording.  [See figure 4.]  In PCM,
each bit records whether the amplitude is high or low at each
point in time. [1]

     Several brands of personal computers include dedicated sound
chips.  The Macintosh, Amiga, and other computers can easily
produce high-quality sound by using dedicated hardware.  On these
computers, many programs are enhanced by the addition of sound
and speech.  Likewise, IBM and a multitude of other companies
have produced boards that allow PCs to record and playback
digitized sound.  A lack of standards, however, limits the use of
these boards, and few commercial programs use them.
     Over the years, several attempts have been made to play
digitized sound on PCs without special hardware.  A number of
these have appeared in the public domain.  Commercial software
has had greater success, but most of these programs produce
rough-sounding speech.  With the recent rise of interest in sound
and pictures, a few developers have produced good, intelligible
speech and sounds.  The work presented here surpasses past
attempts and adds to that of other successes.


                      MATERIALS AND METHODS

     The Intel 8253 programmable interval timer, found on all IBM
PCs and compatibles, is a flexible counter chip.  It has three
16-bit channels.  Each channel produces an output signal based on
an input signal and an arbitrary 16-bit number.  The chip's six
modes can produce varying types of output.  Table 1 summarizes
the six modes.  Refer to Rosch [2] or Sargent and Shoemaker [3]
for more details.

              Table 1:  Intel 8253 Operating Modes

               0 - interrupt on terminal count
               1 - programmable one-shot
               2 - rate generator
               3 - square-wave generator
               4 - software-triggered strobe
               5 - hardware-triggered strobe

     On IBM PCs, the 8253's first channel increments the time-of-
day clock, the second channel refreshes the DRAMs, and the third
channel sends sound to the speaker.  The timer/counter of AT
class machines is based on the Intel 8254-2 chip and is
functionally equivalent to the 8253.  The PS/2 line has an 8253
that is used the same way, except that a separate chip refreshes
memory, and the second channel of the 8253 is used for
diagnostics or is unassigned. [2]
     The 8253 can only give two possible output states:  0 and 1.
Possibly because of this limitation, the chip's use for sound
production in PCs has typically been limited to being a square-
wave generator (mode 2).  When functioning as a square-wave
generator, the chip's output is equal to its input frequency
(1.193 MHz on PCs) divided by a 16-bit number that is input to
the chip by the programmer.  The output is a square wave whose
high and low periods are equal.  The range of the chip falls
between 18.2 hertz and 1.193 megahertz [4].  Mode 2 is used by
channel 0 for the time-of-day clock and by channel 2
     Despite the two-state output limitation, the 8253 can also
play digitized sound.  When put in mode 0, the output is low for
the duration of a number of input pulses equal to a programmed
16-bit number, after which the output goes high.  The net result
is that the resulting wave is in the form of PWM.  [See figure
5.]
     To program PWM sound output on the 8253, several steps must
be taken.  First, the programmable peripheral interface (PPI)
must be initialized to the desired mode.  Then the 8253 must be
initialized before the sound data can be sent to it.  Table 2
summarizes chip ports.

                Table 2:  Chip I/O Port Addresses

                Port                    Dec  Hex
                PPI chip                97   61
                8253 Channel 0          64   40
                8253 Channel 1          65   41
                8253 Channel 2          66   42
                8253 Control Word       67   43

     The PPI must be initialized to allow the 8253 to control the
speaker.  The two lowest bits of port 97 must be turned on.  The
other bits of this port should remain untouched since they are
used for other purposes. [4]
     Port 67 is the control word register which initializes the
chip.  Each counter is initialized by sending this port one
control byte.  Table 3 shows the meanings of the bits in the
control word.

           Table 3:  Meaning of Control Word Register

          bit 0     = 0, count in binary
                    = 1, count in Binary Coded Decimal
          bits 1-3  = mode number (0 to 5 in binary)
          bits 4,5  = 00, latch current count for reading
                    = 01, read/load low byte
                    = 10, read/load high byte
                    = 11, read/load low byte, then high byte
          bits 6,7  = counter number (0 to 2 in binary)

     (Bit 0 is least significant; 7 is most significant)

When reading both bytes of the 16-bit value, a latch command
prevents the count from changing between reading the high byte
and the low byte.  Latching is not needed when reading only a
single byte.  For example to set the chip to generate musical
tones in mode 3 the control word is 182 (B6 hex).  For digital
audio via mode 0, use 176 (B0 hex).
     Ports 64, 65, and 66 are used to read and write to timers 0,
1, and 2 respectively.  Data sent to these ports becomes the 16-
bit number used to affect output.  If only one byte is sent, the
other byte retains its previous value.
     Listing 1 shows the general algorithm for 8-bit digital
audio output with the 8253.  8-bit quality is achieved by leaving
the high byte constant and sending data only to the low byte.

             Listing 1:  Algorithm for Digital Audio

          Load Digital Audio Data;
          value = InPort(61h);     -- Initialize PPI
          OutPort(61h, value OR 3);
          OutPort(43h,B0h);        -- Initialize 8253
          OutPort(41h,00h);
          OutPort(41h,00h);
          OutPort(43h,90h);
          loop until end of data   -- Play sound
               OutPort(41h,Data);
               Wait Until Data Passes;
          OutPort(43h,B6h);        -- Restore 8253

     Listing 2 gives an example program written for Turbo C.
Note that the user must specify whether the input data is signed.
Since the playback method using the 8253 must use unsigned data,
a scaling factor of 128 must be added to all signed data.

           Listing 2:  Turbo C Code for Digital Audio

#include <conio.h>
#include <dos.h>
#include <io.h>
#include <stdio.h>
#include <stdlib.h>

/*   SOUND.C

     Author:   David Chappell
     Version:  1.46c
     Date:     24 June 1990
     Method:   8253 PWM method
*/

FILE *soundfile;    /* input data file */
unsigned long size; /* size of input file */
int wait;        /* time to wait between sending samples out */
unsigned char offset;   /* change signed samples to unsigned */

void error(char message[]);
void playfile(void);
void startspeaker(void);
void openfile(void);
void stopsound(void);
void main(void);

void error(char message[])
/* Purpose:  handles errors */
     {
     fprintf(stderr,"\nERROR:  %s\n",message);
     exit(-1);
     }

void playfile(void)
/* Purpose:  loads file and plays digitized sound */
     {
     unsigned int count, pause;
     char curr;
     char *inputbuffer;

     if ((inputbuffer=(char*) calloc(size,sizeof(char))) == NULL)
          error("Not enough memory to load file");
     fread(inputbuffer, size, 1, soundfile);
     disable();              /* disable interrupts */
     for (count = 0; count < size; count++) {
        curr = *(inputbuffer+count) + offset;
        output(66,curr);
        for (pause = 0; pause < wait; pause++);
        }
     enable();               /* enable interrupts */
     }

void startspeaker(void)
/* Purpose:  initialize speaker for output */
     {
     outp(97,inp(97) | 3);    /* set PPI */
     outp(67,176);       /* send initial data to timer */
     outp(66,00);
     outp(66,00);
     outp(67,144);       /* prepare timer chip to receive data */
     }

void openfile(void)
/* Purpose:  opens input file */
     {
     char choice;        /* key hit by user */

     clrscr();
     puts("What file do you want to hear?");
     if ((soundfile = fopen(gets(NULL),"rb")) == NULL)
        error("Unable to open sound file");
     fseek(soundfile,0,SEEK_SET);
     size = filelength(fileno(soundfile));
     printf("\nFile size = %lu bytes\n\n",size);
     printf("What delay time do you want in FOR counter? ");
     scanf("%d",&wait);
     printf("Is the data signed? ");
     choice=getche();
     if ((choice=='Y') || (choice=='y'))
        offset=128;
     }

void stopsound(void)
/* Purpose:  resets speaker to stop sound */
     {
     outp(67,182);       /* restore timer to mode 3 */
     outp(66,51);        /* set channel 3 to power-on value */
     outp(66,05);
     nosound();
     fclose(soundfile);
     }

void main(void)
   {
   openfile();
   startspeaker();
   playfile();
   stopsound();
   }


                             RESULTS

     The method described thus far has several limitations when
put into practice on PCs.  The 16-bit quality of the chip reduces
to 7 bits at most sample rates.  Also, a background tone is
produced along with the desired sound because of the use of PWM.
Both of these difficulties arise from timing problems.
     The method described thus far has the capability of yielding
high-quality 16-bit sound.  The code given, however, can only
produce approximately 7-bit sound.  Although the 8253 has the
ability to play 16-bit data, the timing limitations of the PC
restrict the length of each pulse.  In order to produce sound at
the rate of about 8-13 kHz, only about six or seven bits of data
are processed before the next piece of data must begin output.
At a slower sample rate of about 4-7 kHz, seven to eight bits of
accuracy can be achieved.  A higher input frequency would resolve
this difficulty; however, this hardware problem can not be easily
accomplished in PCs but would be feasible in other applications
of the 8253.
     The maximum data size (volume) possible can be calculated
mathematically.  The 8253 input rate divided by the output sample
rate yields the number of time periods that pass before the next
sample begins play:

     Maximum value = 1.193 MHz / sample rate

For example, the maximum volume for an 8 kHz sample is 147.  The
1.193 MHz input frequency that feeds the 8253 limits the chips
sound capabilities.
     As an annoying side-effect, the provided algorithm creates a
background tone.  Due to the nature of PWM, at the beginning of
each piece of data, the output goes from low to high. [See figure
4.]  This periodic oscillation produces a pitch equal to the
frequency at which the sound is played.  For example, an 8 kHz
sample will produce a background tone of 8 kHz.  The resulting
tone overlays the digitized sound output.  A pitch of 18 kHz or
greater is high enough that the human ear can not detect it.
Thus, any sample of this frequency will not produce an audible
background tone.  If a given sample is not of high enough
frequency, this problem can be alleviated by outputting each
piece of data multiple times in rapid succession so that the
background tone is of such a high frequency that it is inaudible.
For example, by playing each datum of an 8 kHz sample three
times, the resulting pitch will be 24 kHz.  The first problem,
however, becomes dramatic when a moderate-speed sample is sent
repeatedly:  in order to maintain the original sampling rate, the
8253 has time to process fewer and fewer bits for each datum.


                           DISCUSSION

     The development of this digital audio playback method has
several implications and possibilities.  A variety of
applications, from games to word processors, can use voice and
sound.  On multi-tasking operating systems, sounds can easily be
played in the background.  Other computers could use the same
ideas for audio output.  When combined with extra hardware, this
method can form a complete audio I/O system.  When a PC acts as a
terminal, this procedure will allow mainframes and minicomputers
to play digitized sound.
     Speech interaction is currently put to several uses,
especially to help disabled users.  For example, IBM's
SpeechViewer helps deaf children and adults improve
pronunciation.  A microphone and speech recognition allow the
personal computer to understand the user's voice.  For blind
users, IBM's ScreenReader program can vocally relate the text
that appears on the monitor.  Theoretical physicist Stephen
Hawking uses a computer to talk despite his crippling disease.
There are numerous other instances of disabled users benefitting
from talking computers.
     There are also many computer programs that help people learn
to read and write.  Across the nation, children can listen to the
computer talk as they use IBM's Writing to Read software.
Illiterate adults gain an invaluable skill as a computer speaks
and displays words on the screen.
     In addition, many musicians use computers to produce and mix
sounds.  Computers can now produce rich tones that rival musical
instruments.  Musical synthesizers are actually specialized
computers built for the purpose of producing sound.  With the aid
of digital signal processor boards, personal computers can make
music equal in quality to the better synthesizers.
     By reducing the need for extra hardware, digitized sound can
easily be added to other programs.  As an obvious example, games
can use sound for both special effects and general entertainment.
As multimedia becomes more popular, sound becomes a necessity.
Useful applications, from word processors to spreadsheets, can
speak to help visually impaired users.  For example, allowing
speech output from a word processor would do wonders to assist
visually impaired writers and programmers.
     Adding speech to personal computers would benefit new users.
Inexperienced users would find a computer to be much friendlier
if it could speak to them.  By providing verbal output in
important areas such as error handling, the computer can help new
and disabled users.  A user interface that includes speech can
help bring computers to the level of interaction that humans use
with each other.  Thus, nearly all types of software can benefit
from the addition of speech and sound capabilities.
     When running under PC-DOS, only one program can be run at a
time.  The only way to allow the computer to play recorded sounds
while continuing other work is to modify interrupts.  Under
multi-tasking operating systems, such as OS/2 and Unix, the 8253
could be continually fed data in a background task while the main
program continues.  Playing sound in the background gives more
flexibility.  For example, a communications package could
verbally report an error while continuing to receive data, or a
demonstration could play music while displaying graphics.
     The data storage method used here is compatible with many
others.  A huge number of digitized samples are available from
Macintosh, Amiga, and Atari ST computers.  Several PC expansion
boards also use the same storage method.  Data recorded on any of
this hardware can be played back on an ordinary PC.  Furthermore,
by purchasing an available expansion board or building one, sound
recording is possible on PCs.
     As the use of speech technology grows, speech can be added
to other products.  According to IBM's long-range plan, all host
computers will eventually be accessed via a PS/2 running OS/2.
Although mainframes and minicomputers do not typically have sound
speakers, they could use the PS/2's speaker for speech output.
Thus, by using PCs as terminals, the full range of computers can
handle digital audio.
     The algorithm presented here can be used in settings other
than in a PC.  The same method could be used in any computer with
an 8253 chip, and a hardware expansion using the 8253 can be
added to other computers.  More importantly, any hardware
configuration capable of producing output similar to PWM can,
when connected to a speaker, produce digitized sound.  Similarly,
any system able to produce pulses similar to any digital
recording method can output digitized sound.  As a result,
hardware with only two states can play sound, and a digital-to-
analog converter is not needed.  Most computers, such as the
Amiga, use D/A converters, but this method shows that such is not
required to play digitized sound.
     Building an audio system based around the 8253 would be an
excellent project for hardware students.  An expansion board that
used a 8253 to play digitized sound (in place of a D/A converter)
would be possible for many microcomputers.  A small, self-
contained system created specifically for the 8253 would also be
a good project.  In these cases, the student could design the
hardware so that it does not have the limitations present in PCs.


                           CONCLUSION

     Over the past several decades, engineers have searched for
ways to make computers both talk and play high-quality music.
One solution to both problems, digitization of sound using pulse
modulation, requires little processing time and is thus
appropriate for microcomputers.  Although previous usage of
digitized sound has been limited to computers with specialized
hardware, it is possible for a standard PC to play good quality
sounds without extra hardware.  As the computer world strides
deeper into sound-based applications despite a lack of hardware
standards for sound output on PCs, this method may prove to be
invaluable in bringing sound to the masses.  As people learn more
from computers, their experiences will benefit from the addition
of audio.  Students and teachers can use this knowledge as they
learn about computers and sound.  With minimal effort, any
program can add a new dimension with speech, music, and sound
effects.




                         REFERENCES


[1] Pohlmann, Ken C.  Principles of Digital Audio.  H. W.
     Sams, Indianapolis (1985).

[2] Rosch, Winn L.  The Winn Rosch Hardware Bible.  Simon &
     Schuster, New York (1988).

[3] Sargent, Murray, III and Richard L. Shoemaker.  The IBM
     Personal Computer from the Inside Out.  Addison-Wesley
     (1984).

[4] Norton, Peter.  The Peter Norton Programmer's Guide to
     the IBM PC.  1st ed.  Microsoft, Redmond, WA (1985).


Copyright 1990 by the Consortium for Computing in Small Colleges.
Permission to copy without fee all or part of this material is granted
provided that the copies are not made or distributed for direct commercial
advantage, the CCSC copyright notice and the title of the publication and
its date appear, and notice is given that copying is by permission of the
Consortium for Computing in Small Colleges.  To copy otherwise requires
a fee and/or specific permission.

This article appeared in "Proceeding of the Fourth Annual Southeastern Small
College Computing Conference", November 9-10, 1990.  Reprinted with permission
of the CCSC.