From: decwrl!well.sf.ca.us!well!lilj@uunet.UU.NET (Joshua Neil Rubin)
Subject: Re: VR/Video
Date: 20 Apr 91 02:58:22 GMT
Organization: Whole Earth 'Lectronic Link, Sausalito, CA


Let's put aside for a minute the problem of hidden surfaces.  I 
readily concede a single stereopair has insufficient information to 
allow you to synthesize alternate perspectives of such surfaces.

Taking solely the information from a single stereopair of an object 
with no hidden surfaces, you can synthesize *any* new view of the 
object from *any* perspective you might wish.  Solely with technology 
that is 100 years old.

Take the surface of Mars as an example:

Assume that 100 years ago you had a stereopair looking straight down 
onto a bumpy, craggy, mountainous part of Mars.  The only really 
unusual thing about this stretch of terrain is that every bit of 
surface was in direct line of sight with each of the two cameras 
taking the stereopair.  (This eliminates the hidden surface problem)

Using only these two photos, by using some basic principles of 
stereoscopic arithmetic which have been known since at least the time 
of Wheatstone in the 1800's (before even the invention of photography, 
actually), an accurate ruler, a calculator, and some clay, you could 
easily (albeit tediously) create a perfectly accurate three-
dimensional model of that terrain.  And you could look at it from any 
angle you chose.

As I see it, the problem in quickly synthesizing a new computer-
generated virtual perspective of a scene from a single stereopair 
isn't that you need skillabytes of data.  The problem is that you need 
sophisticated object recognition programs to recognize what 
stereographers call the "homologous points" in the two images which 
make up the stereopair.  These are, as the name implies, the two 
points, one per image in a stereopair, which represent the same 
location in actual space.  (You derive depth information from a 
stereopair by comparing the distance between two points on one image 
of the stereopair with the distance between the two homologous points 
in the other image.)

Humans can pick out homologous points easily enough.  In fact we do it 
automatically whenever we use depth perception.  Computers currently 
have a harder time than we humans do parsing scenes into objects and 
recognizing analogies between imperfectly-matched patterns  Once the 
homologous points have been identified, however, it is a simple matter 
to do the arithmetic required to reconstruct the relative depths of 
the various points in the scene.

I'll grant you that we're talking about immense amounts of computing 
speed and power and memory to do all of that object recognition so 
fast.