From: bkarr@carson.u.washington.edu (Brian Karr) Subject: Re: SCI: Three dimensional sound? Date: Mon, 23 Nov 1992 01:19:25 GMT Organization: Human Interface Technology Lab, Seattle In article <1992Nov19.072149.29598@hitl.washington.edu> "Human Int. Technology" writes: >From: fsjdj1@acad3.alaska.edu >Subject: Re: SCI: Three dimensional sound? >Date: Wed, 18 Nov 1992 20:16:59 GMT >Organization: University of Alaska Fairbanks > >In article <1992Nov16.095935.10365@u.washington.edu>, mcmains@unt.edu (Sean > McMains) writes: >> >> What is the theory behind creating the illusion of a sound emanating >> from a particular point in three dimensional space? With regard to >> lateral motion, the amplitude and timing of the sounds entering each >> ear could obviously be adjusted to create the desired effect. How >> would one create the illusion of a sound coming from above or below >> the listener? Or is this effect only possible through adjusting what >> the listener hears as he moves his head? > > [Mentions articles regarding bi- and multi-directional sound recording.] _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ Not sure if my postings are the ones you are referring to, but I will repost the following info as it seems relevant to the question. Much of it is recycled, and the rest is related news since those postings. -bk (Brian Karr) _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ Here is a quick and dirty explaination of 3D sound filter functions: Most of the cues needed for presenting a spatial audio image, are embedded in the 'earprint' or HRTF (Head Related Transfer Function). An HRTF is a description of how the ears of a test subject filter sound at various points on a sphere, the listener's head being at the center. This is derived by placing mics in the ears of the subject and chirping pseudo- random noise (impulses) at them from the various directions. If a Fourier Transform is performed on what is picked up by the mics, the resulting specra show how the ears and head shape sounds before they reach the eardrum. The sampled signals therefore contain the impulse response (amplitude and phase) of the ear at that angle. Phase is implied since the two ears are supposedly sampled phase-coherently. The earprint then is an array of these responses which are used as filter coefficients for shaping the input signal to be spatialized. If we wish to hear a sound where there has been no actual measurement, the nearest resposes are interpolated. To implement a simulation of this, we effectively need two time-dependent realtime filters for each sound source we wish to localize. As a frequency- domain analogy, imagine two graphic equalizers whose sliders move to new positions whenever we want the source to appear to move to a new location in space. Spatial sound systems use a mathematical version of this called convolution to filter signals digitally. So the major computation going on is interpolation of coefficient sets and the convolution of the input signal to be localized with the appropriate filter responses. That provides a 'free field' spatial display, meaning there are no environ- ment cues since the earprint is usually derived in an acoustically insulated booth of some kind. Also, distance is simulated by 1/distance-squared attenuation. For the anechoic models used here, this may be more like 1/d^.5 since the reverberent energy is no longer included. This method is not entirely correct because the ear tends to normalize volume of sounds that don't have an intrinsic volume. To give convincing distance and environment cues, a reference wall or walls can be placed in the image by manipulating the earprint (adding the responses of the reflections before convolving). For a far better description of all of this, see: Blauert, Jens, 1983. _Spatial Hearing: The Psychophysics of Human Sound Localization_, Cambridge, MA: MIT Press Lehnert, Hilmar & Blauert, Jens, 1992. 'Aspects of Auralization in Binaural Room Simulation.' AES Proceedings, 93rd Convention October 1-4, 1992. Moller, Heinrik, 1992. 'Fundamentals of Binaural Technology.' _Applied Acoustics_, _36_, pp. 171-218. Wenzel, Elizabeth M. 'Localization in Virtual Acoustic Displays,' _Presence_ 1st issue. Wightman, F.L. & Kistler, D.J. 1989a,b. 'Headphone Simulation of Free-field Listening I, II.' _Journal of the Acoustical Society of America_, _85_, 858-878. Hope this info helps. -Brian _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ There are a few spatial audio display systems out these days that I am aware of. Some use speaker arrays and some use ordinary headphones. There is of course surround sound, which requires a loudspeaker array and previously encoded material. Decoders are commonplace today while encoders are not so ubiquitous and are expensive. There is also the 'Ambisonic' system that uses a special 4-element microphone to record and encode natural sound environments. This also requires loudspeakers (4 or 6). There is also of course quadraphonic sound. This is used in the headset of the Virtuality arcade games. They use two speakers in each ear. This display gave me a good azimuth impression, especially with the head tracking, elevation was not so convincing. Mannequin heads with microphones in the ears have been used to make binaural recordings for decades. Recording this way (or with real folks' heads and tiny mics) gives an excellent 3D impression and can really sound like you are _there_. The benifit of using headphones and head tracking is that many people in the same room can have different sonic environments, or cohabitate the same environment from different perspectives, simultaneously. The AL-100 was developed by the Air Force to take advantage of this. The AL-100 was a coffin-sized box fitted with a binaural mannequin head which was spun around in front of a loudspeaker with high-torque motors. This way, a sound could be made to travel around the listener by moving the head appropriately. Last I heard, this box is still working. This whole process has since been realized computationally in a number of excellent systems. Jens Blauert developed the first system in East Germany some years back. This group is now working on a new version of their 'Binaural Mixing Console,' and are doing some excellent work with room simulation (localized reflections in addition to the direct localized source). Commercial Systems: There are many systems that I know of available right now. One is a MacintoshII-based system called Focal Point (Gehring Inc., Bo Gehring) that uses a special DSP card (Audiomedia, which has recently been discontinued by digidesign, alas) for each independently localized sound. This can be used with a CDEV interface or a MIDI application right out of the box, and also comes with a Think C interface to use in your own apps. I also saw/heard an early version of this on the NeXT, but a commercial version is not being persued for the NeXT. This system is also now available for the IBM-PC flavored machine under the same product name. Bo tells me that it has the added benefit that it can alternatively run without the bus. You simply give it power and it wakes up spatializing whatever signal is at its input. Position commands can then be sent to a serial port built onto the card. Handy if you want to skip the PC host. This card spatializes two sounds simultaneously and independently. Focal Point 3D Audio, Niagra Falls, NY. Bo Gehring (716) 285-3930. There is also the IBM-PC based Convolvotron (Crystal River Engineering, Scott Foster) which localizes 4 independent sounds for each 2-card set. This system lets you switch 'earprints' (HRTF's) and comes with a set of earprints, C-programming libraries and sample programs. This system also optionally includes a reflection package which localizes reflections to give an impression of objects (walls) present in the environment and adds another crutial distance cue. This now has been implemented for the PC on a Turtle Beach DSP card. CRE is calling this the 'Beachtron.' This card spatilizes two sounds simultaneously and independently. It has a sample-based synthesizer on the card and a MIDI port (Yes!). Multiple cards can be cascaded, which avoids the need for mixing and the cabling plague. CRE has developed a protocol and software libraries that let you load your AT or an AT backplane up with B-trons or C-trons and talk to it as if it were an audio resource pool. The code autosenses what is on the bus and does the right thing. This makes alot of your code portable between C-trons and B-trons. The B-tron, however does not do room simulations. This audio resource pool package is called the 'Acoustetron.' Crystal River Engineering, Groveland, CA. Scott Foster (209) 962-6382. VPL has worked with the CRE crowd to port this to a Mac-based card for VPL's VR systems. They are calling this the 'CosmTron,' for their MicroCosm system. VPL Research, Foster City, CA. (415) 361-1710. There is also a pro-audio system called the Sound Space processor (Roland Co., Curtis Chan) which is designed to give a 3D image using two loudspeakers. The idea is to compute sounds in their locations as usual and then compensate for speaker cross-talk before the signal goes to the speakers (this is called 'transaural processing'). The result is a sweet spot which is actually a line that is all equidistant points from both speakers. Chances are you have probably already heard this on the radio. Bob Todrank is now the contact at Roland for this machine. RSS processor. Roland Pro Audio/Video Group (213) 685-5141. These systems (F.P., A,B,C-tron, RSS) require no decoding and the signal can be stored on regular audio cassettes (prefferably on DAT, Hi-Fi VHS or MO.) Focal Point and the Convolvotron are designed for headphones, while Roland's box is designed for headphones or speakers. I heard some interesting 'effects' with speakers, though the spatial image didn't always come across. There are transaural processors available if you must use loudspeakers with the personal computer based systems such as F.P. and the A,B,C-tron. Related stuff: The 'Spatializer,' from Audio Intervisual Design is a system that produces eight moving sources in azimuth only. This is not binaural processing in the sense of filtering with pinna responses but I thought I would mention it for completeness. Audio Intervisual Design. (213) 845-1155. The 'Intelliverb,' from RSP Technologies is an ultra-parameterized reverb unit. It can be configured to produce many standard effects but the room simulation effects are most relevant here. Variables include room width, height, and depth, source position, listener position, reverb ducking (can be used to change room absorbtion). These parameters can all be controlled with MIDI (i.e. from your VR code). These variables affect the 'early reflections' in the simulation. The more diffuse late reflections are added in from a selection of algorithms. This box stands out from the vast array of effects processors in my opinion because of the attention to external control of the right variables in the delay/reverb, and some excellent audio specs. The early reflections are not, however, spatialized. They are intended to be correct in time. I have found reverb effects to give an excellent enhancement of presence in VR and this type of box seems to be a good alternative for completely correct room simulations until they become real and affordable. I will post a more complete reveiw of the box after the holidays. RSP Technologies, Rochester Hills, MI. (313) 853-3055. Any others? I'm not sure. It is of course possible to do it with slow hardware in non-realtime so I wouldn't be suprised if many people have developed spatial displays. The critical part of the process is getting a good earprint. Much of the work being done today with room simulation (manipulation impulse responses with raytraced reflections) is being done done off-line because of the computation needs. _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_ I hope this is helpful to enough folks to justify the bandwidth. -Brian bk@hitl.washington.edu _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_