From: bkarr@carson.u.washington.edu (Brian Karr)
Subject: Re: SCI: Three dimensional sound?
Date: Mon, 23 Nov 1992 01:19:25 GMT
Organization: Human Interface Technology Lab, Seattle


In article <1992Nov19.072149.29598@hitl.washington.edu> "Human Int.
Technology" <hlab@milton.u.washington.edu> writes:

>From: fsjdj1@acad3.alaska.edu
>Subject: Re: SCI: Three dimensional sound?
>Date: Wed, 18 Nov 1992 20:16:59 GMT
>Organization: University of Alaska Fairbanks
>
>In article <1992Nov16.095935.10365@u.washington.edu>, mcmains@unt.edu (Sean
> McMains) writes:
>>
>> What is the theory behind creating the illusion of a sound emanating
>> from a particular point in three dimensional space? With regard to
>> lateral motion, the amplitude and timing of the sounds entering each
>> ear could obviously be adjusted to create the desired effect. How
>> would one create the illusion of a sound coming from above or below
>> the listener? Or is this effect only possible through adjusting what
>> the listener hears as he moves his head?
>
>  [Mentions articles regarding bi- and multi-directional sound recording.]

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_

Not sure if my postings are the ones you are referring to, but I will repost
the following info as it seems relevant to the question.  Much of it is
recycled, and the rest is related news since those postings.

-bk
(Brian Karr)

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_

Here is a quick and dirty explaination of 3D sound filter functions:

Most of the cues needed for presenting a spatial audio image, are
embedded in the 'earprint' or HRTF (Head Related Transfer Function).
An HRTF is a description of how the ears of a test subject filter
sound at various points on a sphere, the listener's head being at the
center.  This is derived by placing mics in the ears of the subject
and chirping pseudo- random noise (impulses) at them from the various
directions.  If a Fourier Transform is performed on what is picked up
by the mics, the resulting specra show how the ears and head shape
sounds before they reach the eardrum.  The sampled signals therefore
contain the impulse response (amplitude and phase) of the ear at that
angle.  Phase is implied since the two ears are supposedly sampled
phase-coherently.  The earprint then is an array of these responses
which are used as filter coefficients for shaping the input signal to
be spatialized.  If we wish to hear a sound where there has been no
actual measurement, the nearest resposes are interpolated.

To implement a simulation of this, we effectively need two
time-dependent realtime filters for each sound source we wish to
localize.  As a frequency- domain analogy, imagine two graphic
equalizers whose sliders move to new positions whenever we want the
source to appear to move to a new location in space.  Spatial sound
systems use a mathematical version of this called convolution to
filter signals digitally.

So the major computation going on is interpolation of coefficient sets
and the convolution of the input signal to be localized with the
appropriate filter responses.

That provides a 'free field' spatial display, meaning there are no
environ- ment cues since the earprint is usually derived in an
acoustically insulated booth of some kind.  Also, distance is
simulated by 1/distance-squared attenuation.  For the anechoic models
used here, this may be more like 1/d^.5 since the reverberent energy
is no longer included.  This method is not entirely correct because
the ear tends to normalize volume of sounds that don't have an
intrinsic volume.  To give convincing distance and environment cues, a
reference wall or walls can be placed in the image by manipulating the
earprint (adding the responses of the reflections before convolving).

For a far better description of all of this, see:

Blauert, Jens, 1983.  _Spatial Hearing: The Psychophysics of Human Sound
	Localization_, Cambridge, MA: MIT Press

Lehnert, Hilmar & Blauert, Jens, 1992.  'Aspects of Auralization in 
    	Binaural Room Simulation.'  AES Proceedings, 93rd Convention
    	October 1-4, 1992.

Moller, Heinrik, 1992.  'Fundamentals of Binaural Technology.' _Applied 
    	Acoustics_, _36_, pp. 171-218. 

Wenzel, Elizabeth M.  'Localization in Virtual Acoustic Displays,'
	_Presence_ 1st issue.

Wightman, F.L. & Kistler, D.J. 1989a,b.  'Headphone Simulation of Free-field
	Listening I, II.'  _Journal of the Acoustical Society of America_,
	_85_, 858-878.


Hope this info helps.

-Brian

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_


There are a few spatial audio display systems out these days that I am
aware of.  Some use speaker arrays and some use ordinary headphones.

There is of course surround sound, which requires a loudspeaker array
and previously encoded material.  Decoders are commonplace today while
encoders are not so ubiquitous and are expensive.  There is also the
'Ambisonic' system that uses a special 4-element microphone to record
and encode natural sound environments.  This also requires
loudspeakers (4 or 6).  There is also of course quadraphonic sound.
This is used in the headset of the Virtuality arcade games.  They use
two speakers in each ear.  This display gave me a good azimuth
impression, especially with the head tracking, elevation was not so
convincing.

Mannequin heads with microphones in the ears have been used to make
binaural recordings for decades.  Recording this way (or with real
folks' heads and tiny mics) gives an excellent 3D impression and can
really sound like you are _there_.

The benifit of using headphones and head tracking is that many people
in the same room can have different sonic environments, or cohabitate
the same environment from different perspectives, simultaneously.

The AL-100 was developed by the Air Force to take advantage of this.
The AL-100 was a coffin-sized box fitted with a binaural mannequin
head which was spun around in front of a loudspeaker with high-torque
motors.  This way, a sound could be made to travel around the listener
by moving the head appropriately.  Last I heard, this box is still
working.

This whole process has since been realized computationally in a number
of excellent systems.  Jens Blauert developed the first system in East
Germany some years back.  This group is now working on a new version
of their 'Binaural Mixing Console,' and are doing some excellent work
with room simulation (localized reflections in addition to the direct
localized source).


Commercial Systems:

There are many systems that I know of available right now.  One is a
MacintoshII-based system called Focal Point (Gehring Inc., Bo Gehring)
that uses a special DSP card (Audiomedia, which has recently been
discontinued by digidesign, alas) for each independently localized
sound.  This can be used with a CDEV interface or a MIDI application
right out of the box, and also comes with a Think C interface to use
in your own apps.  I also saw/heard an early version of this on the
NeXT, but a commercial version is not being persued for the NeXT.

This system is also now available for the IBM-PC flavored machine
under the same product name.  Bo tells me that it has the added
benefit that it can alternatively run without the bus.  You simply
give it power and it wakes up spatializing whatever signal is at its
input.  Position commands can then be sent to a serial port built onto
the card.  Handy if you want to skip the PC host.  This card
spatializes two sounds simultaneously and independently.  Focal Point
3D Audio, Niagra Falls, NY.  Bo Gehring (716) 285-3930.

There is also the IBM-PC based Convolvotron (Crystal River
Engineering, Scott Foster) which localizes 4 independent sounds for
each 2-card set.  This system lets you switch 'earprints' (HRTF's) and
comes with a set of earprints, C-programming libraries and sample
programs.  This system also optionally includes a reflection package
which localizes reflections to give an impression of objects (walls)
present in the environment and adds another crutial distance cue.

This now has been implemented for the PC on a Turtle Beach DSP card.
CRE is calling this the 'Beachtron.'  This card spatilizes two sounds
simultaneously and independently.  It has a sample-based synthesizer
on the card and a MIDI port (Yes!).  Multiple cards can be cascaded,
which avoids the need for mixing and the cabling plague.  CRE has
developed a protocol and software libraries that let you load your AT
or an AT backplane up with B-trons or C-trons and talk to it as if it
were an audio resource pool.  The code autosenses what is on the bus
and does the right thing.  This makes alot of your code portable
between C-trons and B-trons.  The B-tron, however does not do room
simulations.  This audio resource pool package is called the
'Acoustetron.'  Crystal River Engineering, Groveland, CA.  Scott
Foster (209) 962-6382.

VPL has worked with the CRE crowd to port this to a Mac-based card for
VPL's VR systems.  They are calling this the 'CosmTron,' for their
MicroCosm system.  VPL Research, Foster City, CA. (415) 361-1710.

There is also a pro-audio system called the Sound Space processor
(Roland Co., Curtis Chan) which is designed to give a 3D image using
two loudspeakers.  The idea is to compute sounds in their locations as
usual and then compensate for speaker cross-talk before the signal
goes to the speakers (this is called 'transaural processing').  The
result is a sweet spot which is actually a line that is all
equidistant points from both speakers.  Chances are you have probably
already heard this on the radio.  Bob Todrank is now the contact at
Roland for this machine.  RSS processor.  Roland Pro Audio/Video Group
(213) 685-5141.

These systems (F.P., A,B,C-tron, RSS) require no decoding and the
signal can be stored on regular audio cassettes (prefferably on DAT,
Hi-Fi VHS or MO.)  Focal Point and the Convolvotron are designed for
headphones, while Roland's box is designed for headphones or speakers.
I heard some interesting 'effects' with speakers, though the spatial
image didn't always come across.

There are transaural processors available if you must use loudspeakers
with the personal computer based systems such as F.P. and the
A,B,C-tron.

Related stuff:

The 'Spatializer,' from Audio Intervisual Design is a system that
produces eight moving sources in azimuth only.  This is not binaural
processing in the sense of filtering with pinna responses but I
thought I would mention it for completeness.  Audio Intervisual
Design. (213) 845-1155.

The 'Intelliverb,' from RSP Technologies is an ultra-parameterized
reverb unit.  It can be configured to produce many standard effects
but the room simulation effects are most relevant here.  Variables
include room width, height, and depth, source position, listener
position, reverb ducking (can be used to change room absorbtion).
These parameters can all be controlled with MIDI (i.e. from your VR
code).  These variables affect the 'early reflections' in the
simulation.  The more diffuse late reflections are added in from a
selection of algorithms.  This box stands out from the vast array of
effects processors in my opinion because of the attention to external
control of the right variables in the delay/reverb, and some excellent
audio specs.  The early reflections are not, however, spatialized.
They are intended to be correct in time.  I have found reverb effects
to give an excellent enhancement of presence in VR and this type of
box seems to be a good alternative for completely correct room
simulations until they become real and affordable.  I will post a more
complete reveiw of the box after the holidays.  RSP Technologies,
Rochester Hills, MI.  (313) 853-3055.

Any others?  I'm not sure.  It is of course possible to do it with
slow hardware in non-realtime so I wouldn't be suprised if many people
have developed spatial displays.  The critical part of the process is
getting a good earprint.  Much of the work being done today with room
simulation (manipulation impulse responses with raytraced reflections)
is being done done off-line because of the computation needs.

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_

I hope this is helpful to enough folks to justify the bandwidth.

-Brian

bk@hitl.washington.edu

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_