From: mike@usdc.mew.mei.co.jp (Mike Taylor)
Subject: Re: 3-D Sound imaging systems (a clarification)
Date: 17 Dec 91 09:32:49 GMT
Organization: Matsushita Electric Works R & D Laboratory Inc., San Jose


Re: wex@pws.ma30.bull.com (Ren and Stimpy Live.) <1991Dec10.175217.
    11876@milton.u.washington.edu>

I've got two points to make in this posting....

POINT ONE--regarding the helicopter sounds of Focal Point

I recently saw a demo of Focal Point at Cyberarts and to the best of my
knowledge, the helicopter sounds displayed by Focal Point are really
binaural recordings.  After the helicopter sounds are displayed you then
hear a tract that was processed by Focal Point.

POINT TWO---

Also, I've been following this thread for some time and I've decided to
throw in my own comments.....

I'll do this even though the original Convolvotron critic has decided that,
in fact, he wasn't even listening to one.


BASIC PROCESSING FUNCTIONS (HRTF'S)

My opinion is that the Convolvotron is a higher-end product and it does
provide better spatial resolution.  Here's why.  Both products use
convolution as their main processing tool and both products use a set of
filters that are called HRTF's for head-related-transfer-functions.  And in
fact, I'm almost certain that both products use the same filter sets that
were measured at the University of Wisconsin by Fred Wightman and Doris
Kistler.  (This may not be true of Focal Point) These filters were measured
at a discrete finite number of spatial locations about a subject's head.

Depending upon the direction of sound, each unit chooses a pair of HRTF's
that represent how the sound should be filtered for the left and right
ears.  Each filter is an FIR filter that is the impulse response that was
recorded in a human ear for that particular direction (or close by).  In
fact, in the Convolvotron, these filters are spatially interpolated so that
there are no noticeable clicks or jumps in the sound as the virtual
source(s) and the users head move.


# OF FIR TAPS, SAMPLING FREQUENCY

Here's the key, these filters are a minimum of 128 points long.  This is
roughly the number of 50Khz samples that it takes to store the complete
impulse response of the human ear.  In fact, even 128 taps is cutting a
little of the response off.  The Convolvotron can properly filter 4
incoming signals with 8 128 tap filters at a 50Khz sample rate.  And it
does this with about a 200ms to 300ms delay.  Anything less than 128
samples means that the frequency resolution of the device is distorted from
the original.  Anything less than 50Khz sampling rate also means that
higher frequency sounds can't be displayed (Although this isn't too
bothersome). This is because of the sampling theorem.


FOCAL POINT HARDWARE

Focal Point uses a card from Digidesign that has a 56000 Motorola DSP.
This is a fine card but it can't match the Convolvotron in processing
power.  The 56001 specs show that a 128 tap FIR filter can be done at a
70Khz rate but this is a little deceiving.  This is because to filters must
be interpolated in space and the output of the filters must be interpolated
in time in order to remove the effects of switching filters.  This is
further complicated by the fact that after a filter switch occurs, you have
to wait for 127 bad samples before you get good data.  There is also the
issue of taking processor time to read in new data and output processed
data.  This means that to properly display sounds, the full 70Khz sampling
power of the 56000 can't be used.  Now I nor anyone else, aside from Bo
Gerhing and his close friends, know exactly what Focal Point is doing.  We
don't really know that he's using full 128 tap filters (probably not) and
we don't know the sampling rate of the filters (though the i/o claim for
the board is 44.1Khz) So it's hard to comment on the exact details of Focal
Point.


LATEST CONVOLVOTRON WORK

But wait, there is more.  Recently we have been working with Crystal River
on actually synthesizing room acoustics.  We now have the capability to
simulate the direct path plus 6 reflections on one Convolvotron in real
time.  Now this means that we have implemented 15 128 point filters in real
time.  That's two filters per ear plus a filter for the material type of
the wall.  And we can change the material type of the walls, the location
of the walls, the location of the source and the location of the listener
all in real time.  I think there's a total delay of 300ms.

The people doing the perceptual studies in this field have found that there
are no depth queues when listening to anechoic signals.  Thus, the only way
to provide this information is to simulate room acoustics and the Convolvotron
does this fairly nicely.


PERCEPTUAL ISSUES

It turns out that the natural filtering that goes on when a person hears a
sound is different from person to person.  If you hear synthesized sounds
with filters that are close to your own ears' then the perceived location
is better than from a filter set that is not close to your own.  Typically,
reproducing the z plane is the most difficult.  Frequency resolution is
critically important in displaying elevation queues because human
perception of these is believed to be completely frequency dependent.


WHAT YOU GET

OK, the Convolvotron is expensive (by the way, the latest price is $15,000)
but you get a two-card set and you also get SOURCE CODE.  With Focal Point
you get a digidesign card and executable only (for a lot less, though).


I hope I have presented a fair description of both products.  There is a
slight chance that my description of Focal Point is a little inaccurate
because I don't know that product as well.  

<================================================>
R. Michael Taylor, Project Leader
mike@mew.mei.co.jp
Matsushita Electric Works, Virtual Reality Group                      
401 River Oaks Parkway            (408) 433-3386
San Jose, CA 95134
<================================================>