SOFTWARE-BASED DIGITAL AUDIO ON PCs
          by David T. Chappell

Copyright (C) 1991  ACM.  All Rights Reserved

This article originally appeared in the Proceedings of the 1991 ACM 
Computer Science Conference, March 1991.  Copying is by permission of 
the Association for Computing Machinery.


ABSTRACT

	Digital audio techniques were investigated with special 
concentration on computer applications.  Experimentation showed that 
the Intel 8253 programmable interval timer, found on all IBM PCs and 
compatibles, can output digitized sound.  The relationship between 
pulse amplitude modulation and pulse width modulation signals was 
found to be significant to this process.    Feeding digitized sound 
data to the chip results in an inherent transfer from pulse amplitude 
modulation to pulse width modulation encoding.  The relationship in 
signal modulation allows IBM PCs to play digitized sound without the 
use of extra hardware.

Biography:
	David T. Chappell is a senior in the computer science curriculum 
at North Carolina State University.  In addition, he participates in 
the co-op program and works for IBM.  His interests are in the areas 
of advanced input/output, technological research, and scientific 
applications.  David plans to attend graduate school in computer 
science or a related field.


INTRODUCTION
	In the past few years, the computer industry has been slowly 
gaining interest in computer-generated speech and sound.  In this 
area, the Apple Macintosh and Commodore Amiga provide built-in sound 
control hardware so that their machines can play digitized recordings 
with little effort from the programmer.  IBM, however, has chosen not 
to include advanced sound capabilities in its line of personal 
computers.  The market has shown that relatively few people will buy 
speech or sound add-ons from either IBM or third parties.  Until such 
hardware is standardized, sound software will encounter great 
difficulty in gaining acceptance.  It is in this regard, however, that 
mathematics, engineering, and software can come to the rescue:  it is 
possible for a PC to play good quality sound without additional 
hardware.


BACKGROUND
	Sound is transmitted through the air as a longitudinal wave.  The 
source of the sound compresses the air in one area and this air 
compresses the air next to it while it moves back to its original 
position.  As each air molecule is displaced from and returns to its 
normal position, the wave travels through the air.  The movement that 
results from these repeated compressions and rarefactions is called 
the propagation of the wave [1].
	It is often convenient to use graphs to visually represent sound.  
Most pictures show a graph of the wave amplitude vs. time, where the 
amplitude represents the displacement of molecules from their original 
positions.  [See figure 1.]  Regular waves, such as sine waves, need 
not be depicted in such a manner but can be represented by just a 
frequency (or wavelength), amplitude (volume), and duration.  For 
example, the SOUND statement in Microsoft BASIC has the parameters of 
frequency and duration.  (Volume is not needed because the PC has only 
one volume.)  The irregular waves that make up most of the sound we 
hear are more complicated and need to be represented by giving 
specific details on the amplitude at each point in time.
	Digitized sounds consist of the numerical values for the 
amplitude at regular time intervals.  [See figure 2.]  Along with the 
amplitude data, the number of samples taken per second must be 
recorded.  By reproducing the amplitude changes at the original rate, 
the sound can then be played back.  Much research has gone into 
digitized sound, and it is now possible to produce digitized 
recordings which, to the human ear, are identical to their analog 
counterparts.
	Once recorded, digital signals can be stored in a variety of 
formats.  Pulse amplitude modulation (PAM) is the standard method of 
representing analog data.  [See figure 3.]  In this method, each piece 
of data represents the amplitude at one instant in time.  Pulse width 
modulation (PWM) treats each piece of information as the duration of a 
pulse which starts at a regular frequency.  [See figure 4.]  In pulse 
code modulation (PCM),  each bit records whether the amplitude is high 
or low at each point in time.  [See figure 5.]  PCM is the most common 
method in digital audio recording and is used by audio compact discs 
to achieve high-quality sound.  Pulse position modulation (PPM) 
records the position of a brief pulse by giving the time duration 
before the pulse occurs [1].  [See figure 6.]
	Several brands of personal computers include dedicated sound 
chips.  The Macintosh, Amiga, Atari ST, and other computers can easily 
produce high-quality sound by using dedicated hardware.  On these 
computers, many programs are enhanced by the addition of sound and 
speech.  Likewise, IBM and a multitude of other companies have 
produced boards that allow PCs to record and playback digitized sound.  
A lack of standards, however, limits the use of these PC boards, and 
few programs use them.
	Over the years, several attempts have been made to play digitized 
sound on PCs without special hardware.  A number of these have 
appeared in the public domain, but few are coherent.  Commercial 
software has had greater success, but most of these programs produce 
rough speech.  With the recent rise of interest in audio and video, a 
few developers have even produced good, intelligible speech and 
sounds.  The work presented here rivals that of the best public domain 
and commercial successes.


MATERIALS AND METHODS
	The Intel 8253 programmable interval timer, found on all IBM PCs 
and compatibles, is a flexible counter chip.  It has three 16-bit 
channels.  Each channel produces an output signal based on an input 
signal and a programmed 16-bit number.  The chip's six modes can 
produce varying types of output.  Table 1 summarizes the six modes.  
Refer to Rosch [2] or Sargent and Shoemaker [3] for more details.




Table 1:  Intel 8253 Operating Modes

			0 - interrupt on terminal count
			1 - programmable one-shot
			2 - rate generator
			3 - square-wave generator
			4 - software-triggered strobe
			5 - hardware-triggered strobe

	On IBM PCs, the 8253's first channel increments the time-of-day 
clock, the second channel refreshes the DRAMs, and the third channel 
sends sound to the speaker.  The timer/counter of AT class machines is 
based on the Intel 8254-2 chip and is functionally equivalent to the 
8253.  The PS/2 line has an 8253 that is used the same way, except 
that a separate chip refreshes memory, and the second channel of the 
8253 is used for diagnostics or is unassigned [2].
	The 8253 can only give two possible output states:  0 and 1.  
Possibly because of this limitation, the chip's use for sound 
production in PCs has typically been limited to being a square-wave 
generator (mode 2).  When functioning as a square-wave generator, the 
chip's output is equal to its input frequency (1.193 MHz on PCs) 
divided by a 16-bit number that is input to the chip by the 
programmer.  The output is a square wave whose high and low periods 
are equal.  The range of the chip falls between 18.2 hertz and 1.193 
megahertz [4].  Mode 2 is used by channel 0 for the time-of-day clock 
and by channel 2 to produce the beeps and whistles found in many DOS 
programs.
	Despite the two-state output limitation, the 8253 can also play 
digitized sound.  When put in mode 0, the output defaults to high.  
When a 16-bit number is then programmed into the chip, the output goes 
low for the duration of the specified number of input pulses, after 
which the output returns high.  The net result is that the output wave 
is in the form of PWM.  [See figure 7.]
	To program PWM sound output on the 8253, several steps must be 
taken.  First, the programmable peripheral interface (PPI) chip must 
be initialized to the desired mode.  Then the 8253 must be initialized 
and data sent to it.  Table 2 summarizes chip ports.

Table 2:  Chip I/O Port Addresses

			Port				Dec	Hex
			PPI chip			97	61
			8253 Channel 0		64	40
			8253 Channel 1		65	41
			8253 Channel 2		66	42
			8253 Control Word	67	43

	To allow the 8253 to control the speaker, the PPI must be set 
correctly.  The two lowest bits of port 97 must be turned on.  The 
other bits of this port should remain untouched since they are used 
for other purposes [4].
	Port 67 is the control word register which initializes the 8253.  
Each counter is initialized by sending this port one control byte.  
Table 3 shows the meanings of the bits in the control word.

Table 3:  Meaning of 8253 Control Word Register

		bit 0	= 0, count in binary
				= 1, count in Binary Coded Decimal
		bits 1-3	= mode number (0 to 5 in binary)
		bits 4,5	= 00, latch current count for reading
				= 01, read/load low byte
				= 10, read/load high byte
				= 11, read/load low byte, then high byte
		bits 6,7	= counter number (0 to 2 in binary)	

	(Bit 0 is least significant; 7 is most significant)

When reading both bytes of the 16-bit value, a latch command prevents 
the count from changing between reading the high byte and the low 
byte.  Latching is not needed when reading only a single byte.  For 
example to set the chip to generate musical tones in mode 3 the 
control word is 182 (B6 hex).  For digital audio via mode 0, use 176 
(B0 hex).
	Ports 64, 65, and 66 are used to read and write to timers 0, 1, 
and 2 respectively.  Data sent to these ports becomes the 16-bit 
number used to affect output.  If only one byte is sent, the other 
byte retains its previous value.
	Listing 1 shows the general algorithm for 8-bit digital audio 
output with the 8253.  8-bit quality is achieved by leaving the high 
byte constant and sending data only to the low byte.


Listing 1:  Algorithm for Digital Audio

			Load Digital Audio Data;
			value = InPort(61h);	-- Initialize PPI
			OutPort(61h, value OR 3);
			OutPort(43h,B0h);		-- Initialize 8253
			OutPort(41h,00h);
			OutPort(41h,00h);
			OutPort(43h,90h);
			loop until end of data	-- Play sound
				OutPort(41h,Data);
				Wait Until Data Passes;
			OutPort(43h,B6h);		-- Restore 8253

	Listing 2 gives an example program written for Turbo C.  In the 
sample program, the user must specify whether the input data is 
signed.  Some forms of digital audio storage, including PCM, 
inherently use unsigned variables.  Other forms, including PWM and 
PAM, can be stored as signed variables, and notably the Amiga 
microcomputer stores digitized sound data on a scale from -128 to 127.  
Since the playback method using the 8253 must use unsigned data, a 
scaling factor of 128 must be added to all signed data [5].


Listing 2:  Turbo C Code for Digital Audio

#include <conio.h>
#include <dos.h>
#include <io.h>
#include <stdio.h>
#include <stdlib.h>

/*	SOUND.C

	Author:   David Chappell
	Version:  1.46d
	Date:     24 June 1990
	Method:   8253 PWM method
*/

FILE *soundfile;	/* input data file */
unsigned long size;	/* size of input file */
int wait,      /* time to wait between sending samples out */
unsigned char  offset,  /* change signed samples to unsigned */
    			vol1, vol2;  /* adjusts range of data */

void error(char message[])
/* Purpose:  handles errors */
	{
	fprintf(stderr,"\nERROR:  %s\n",message);
	exit(-1);
	}

void playfile(void)
/* Purpose:  loads file and plays digitized sound */
	{
	int pause;
	unsigned int count, temp;
	unsigned char min, max, data;
	char *inputbuffer;

	if ((inputbuffer=(char*) calloc(size,sizeof(char))) == NULL)
		error("Not enough memory to load file");
	fread(inputbuffer, size, 1, soundfile);
	/* scale data */
	min=255;  max=0;
     for (count = 0; count < size; count++) {
		data = *(inputbuffer + count)+offset;
		if (data<min)
			min=data;
		if (data>max)
			max=data;
		}
	vol1 = 64;
	vol2 = max-min+1;  	/* scale from 0 to 64 */
	offset -= min;     	/* move lowest point to zero */
	disable();		/* disable interrupts */
	for (count = 0; count < size; count++) {
		data = *(inputbuffer + count) + offset;
		temp = data * vol1 / vol2;
		data = temp + 1;
		output(66,data);
		for (pause = 0; pause < wait; pause++);
		}
	enable();			/* enable interrupts */
	}

void startspeaker(void)
/* Purpose:  intitialize speaker for output */
	{
	outp(97,inp(97) | 3);	/* set PPI */
	outp(67,176);		/* send initial data to timer */
	outp(66,00);
	outp(66,00);
	outp(67,144);		/* prepare timer chip to receive data */
	}

void openfile(void)
/* Purpose:  gets information from user and opens input file */
	{
	char choice;		/* key hit by user */
	
	clrscr();			/* clear screen */
	puts("What file do you want to hear?");
	if ((soundfile = fopen(gets(NULL),"rb")) == NULL)
	   error("Unable to open sound file");
	fseek(soundfile,0,SEEK_SET);
	size = filelength(fileno(soundfile));
	printf("\nFile size = %lu bytes\n\n",size);
	printf("What delay time do you want in FOR counter? ");
	scanf("%d",&wait);
	printf("Is the data signed? ");
	choice=getche();
	if ((choice=='Y') || (choice=='y'))
	   offset=128;
	}

void stopsound(void)
/* Purpose:  resets speaker to stop sound */
	{
	outp(67,182);		/* restore timer to mode 3 */
	fclose(soundfile);
	}

void main(void)
   {
   openfile();
   startspeaker();
   playfile();
   stopsound();
   }

RESULTS
	The method described thus far has several limitations when put 
into practice on PCs.  The 16-bit quality of the chip reduces to 7 
bits at most sample rates.  Also, a background tone is produced along 
with the desired sound because of the use of PWM.  Both of these 
difficulties arise from timing problems.
	The method described thus far has the capability of yielding 
high-quality 16-bit sound.  The code given, however, can only produce 
approximately 7-bit sound.  Although the 8253 has the ability to play 
16-bit data, the timing limitations of the PC restrict the length of 
each pulse.  In order to produce sound at the rate of about 8-13 kHz, 
only about six or seven bits of data are processed before the next 
piece of data must begin output.  At a slower sample rate of about 4-7 
kHz, seven to eight bits of accuracy can be achieved.  A higher input 
frequency would resolve this difficulty; however, the frequency can 
not be changed in PCs but could be modified in other applications of 
the 8253.
	The maximum data size (volume) possible can be calculated 
mathematically.  The 8253 input rate divided by the output sample rate 
yields the number of time periods that pass before the next sample 
begins play:

	Maximum value = 1.193 MHz / sample rate
For example, the maximum volume for an 8 kHz sample is 147.  The 1.193 
MHz input frequency that feeds the 8253 limits the chips sound 
capabilities.
	As an annoying side-effect, the provided algorithm creates a 
background tone.  Due to the nature of PWM, at the beginning of each 
piece of data, the output changes state as it goes from high to low.  
[See figure 7.]  This periodic oscillation produces a pitch equal to 
the frequency at which the sound is played.  For example, an 8 kHz 
sample will produce a background tone of 8 kHz.  The resulting tone 
overlays the digitized sound output.  The human ear can not detect 
pitches beyond 22 kHz, but most people can not hear a pitch of 18 kHz 
or greater frequency.  Thus, any sample of this frequency will not 
produce an audible background tone.  If a given sample is not of high 
enough frequency, this problem can be alleviated by playing each piece 
of data multiple times in rapid succession so that the background tone 
is of such a high frequency that it is inaudible.  For example, by 
playing each datum of a 9.5 kHz sample twice, the resulting pitch will 
be 19 kHz.  The first problem, however, becomes dramatic when a 
moderate-speed sample is sent repeatedly:  in order to maintain the 
original sampling rate, the 8253 has time to process fewer and fewer 
bits for each datum [5].


DISCUSSION
	Several other methods were attempted before the above method was 
discovered.  Although all of these other methods yield sound of 
varying quality, most do not produce recognizable results, and none 
match the PWM method.  With some work, any of the following techniques 
might be able to give better output.
	One idea is to look at each piece of data, compare it to the 
midpoint, and set the speaker bit appropriately.  For example, the 
midpoint for 8-bit data is alternately 0 or 128 (depending on whether 
a sign bit is used), so any values above 128 (or 0) could be set as 1 
and all other values set as 0.  Modifying bit 3 of I/O port 97 (61 
hex) changes the state of the speaker's position.  This idea could 
never yield very good results, however, as it gives only 1-bit 
accuracy.
	A heretofore ineffective method is based on PCM encoding.  Each 
individual bit of input directly determines the state of the speaker.  
Manipulating bit 3 of port 97 yields the two states.
	Another possibility is to use mode 3 of the 8253 to play a pitch 
which is either directly or inversely proportional to the data.  
Although this method yields audible sound, the results are not as good 
as those of the main PWM method.
	A final idea is to use the 8253 based on pulse position 
modulation.  Mode 4 of the 8253 should produce PPM output, but trials 
so far have given negative results.

	The given algorithm can be further improved by allowing the 
program to automatically determine the speed of the computer.  In the 
current method, a delay time must be entered manually.  By 
incrementing a counter while watching the system clock, a ratio of 
instructions to time can be calculated.  Under PC-DOS and OS/2, the 
18.2 Hz frequency of the system clock limits the accuracy of this 
idea, however, unless channel 0 of the 8253 is used to change the 
system timer.
	A better timing method would be to use an algorithm that does not 
rely on loops for timing delays.  Under PC-DOS, interrupts can be 
modified so that data is played after each clock tick.  By increasing 
the frequency of clock ticks and changing the timer interrupt, the 
computer can output each piece of data at a regular interval.  Other 
operating systems, however, do not allow such interrupt modification.

	The development of this digital audio playback method has several 
implications and possibilities.  A variety of applications, from games 
to word processors, can use voice and sound.  On multi-tasking 
operating systems, sounds can easily be played in the background.  
Other computers could use the same ideas for audio output.  When 
combined with extra hardware, this method can form a complete audio 
I/O system.  When a PC acts as a terminal, this procedure will allow 
mainframes and minicomputers to play digitized sound.
	Speech interaction is currently put to several uses.  For 
example, IBM's SpeechViewer helps deaf children and adults improve 
pronunciation.  In addition, several programs use spoken words to help 
people learn to read and write.  For blind users, IBM's ScreenReader 
program can vocally relate the text that appears on the monitor.  
There are numerous instances of disabled users benefitting from 
talking computers.
	In addition, many musicians use computers to produce and mix 
sounds.  Computers can now produce rich tones that rival musical 
instruments.
	By reducing the need for extra hardware, digitized sound can 
easily be added to other programs.  As an obvious example, games can 
use sound for both special effects and general entertainment.  Useful 
applications, from word processors to spreadsheets, can speak to help 
visually impaired users.  Inexperienced users would find a computer to 
be much friendlier if it could speak to them.  A user interface that 
includes speech can help bring computers to the level of interaction 
that humans use with each other.  Thus, nearly all types of software 
can benefit from the addition of speech and sound capabilities [5].
	When running under PC-DOS, only one program can be run at a time.  
The only way to allow the computer to play recorded sounds while 
continuing other work is to modify interrupts, as mentioned above.  
Under multi-tasking operating systems, such as OS/2 and Unix, the 8253 
could be continually fed data in a background task while the main 
program continues.  Playing sound in the background gives more 
flexibility.  For example, a communications package could verbally 
report an error while continuing to receive data, or a graphical demo 
could play music in the background while displaying picture on the 
monitor.
	The data storage method used here is compatible with many others.  
A huge number of digitized samples are available from Macintosh, 
Amiga, and Atari ST computers.  Several PC expansion boards also use 
the same storage method.  Data recorded on any of this hardware can be 
played back on an ordinary PC.  Whether the original sample is in the 
form of PAM, PCM, or PWM, it can play through the PC speaker.  
Furthermore, by purchasing an available expansion board or building 
one, sound recording is possible on PCs.
	As the use of speech technology grows, speech can be added to 
larger machines.  According to IBM's long-range plan, all host 
computers will eventually be accessed via a PS/2 running OS/2.  
Although mainframes and minicomputers do not typically have sound 
capabilities, they could be use the PS/2's speaker for speech output.  
Thus, by using PCs as terminals, the full range of computers can 
handle digital audio.
	The algorithm presented here can be used in settings other than 
in a PC.  The same method could be used in any computer with an 8253 
chip, and a hardware expansion using the 8253 can be added to other 
computers.  More importantly, any hardware configuration capable of 
producing output similar to PWM can, when connected to a speaker, 
produce digitized sound.  Similarly, any system able to produce pulses 
similar to any digital recording method can output digitized sound.  
As a result, hardware with only two output states can play sound, and 
a digital-to-analog converter, such as found in the Amiga, is not 
required.


CONCLUSION
	Over the past several decades, engineers have searched for ways 
to make computers both talk and play high-quality music.  One solution 
to both problems, digitization of sound using pulse modulation, 
requires little processing time and is thus appropriate for 
microcomputers.  Although previous usage of digitized sound has been 
limited to computers with specialized hardware, it is possible for a 
standard PC to play good quality sounds without extra hardware.  As 
the computer world strides deeper into multimedia and other sound-
based applications despite a lack of hardware standards for sound 
output on PCs, this method may prove to be invaluable in bringing 
sound to the masses.  With minimal effort, any program can add a new 
dimension with speech, music, and sound effects.


REFERENCES


[1] Pohlmann, Ken C.  Principles of Digital Audio.  H. W. Sams, 
Indianapolis (1985).

[3] Rosch, Winn L.  The Winn Rosch Hardware Bible.  Simon & Schuster, 
New York (1988).

[5] Sargent, Murray, III and Richard L. Shoemaker.  The IBM Personal 
Computer from the Inside Out.  Addison-Wesley (1984).

[2] Norton, Peter.  The Peter Norton Programmer's Guide to the IBM PC.  
1st ed.  Microsoft, Redmond, WA (1985).

[6] Chappell, David T.  "Achieving Inexpensive Digital Audio on PCs 
for Educational Purposes".  Proceedings of the Southeastern Small 
College Computing Conference.  (1990).



Permission to copy without fee all or part of this material is granted 
provided that the copies are not made or distributed for direct 
commercial advantage, the ACM copyright notice and the title of the 
publication and its date appear, and notice is given that copying is 
by permission of the Association for Computing Machinery.  To copy 
otherwise, or to republish, requires a fee and/or specific permission.

1

17


