From: John Costella <jpc@tauon.ph.unimelb.edu.au>
Subject: PAPER: Galilean Antialiasing for VR, Part 04/04
Date: Mon, 26 Oct 92 5:19:38 EET


%  File 4 of 4.  NOTE: All four files MUST be concatenated
%                before they can be LaTeXed.
%
%
%  ... Continuation of "Galilean Antialiasing for Virtual Reality Displays"
%  
%  (The following line *must* be left blank.)

The reason for this waste of resolution, of course, is that we have
tried to stretch a planar device across our field of view.
What is perhaps not so obvious is the fact that no amount of
technology, no amount of optical trickery, can remove this problem:
\e{it is an inherent flaw in trying to use a planar display as a
flat window on the virtual world}.
This point is so important that it shall be repeated in a slightly
different form: \e{Any rendering algorithm that assumes a regularly-spaced
planar display device will create a central pixel 33 times the size of a
peripheral pixel under \VR\ conditions.}
This is not open for discussion; it is a mathematical fact.

Let us, however, consider whether we might not, ultimately, avoid
this problem by simply producing a higher-resolution display.
Of course, one can always compensate for the factor of four 
degradation by increasing the linear resolution of the device
in each direction by this factor.
However, there is a more disturbing 
psychological property of the planar display:
pixels near the centre of view
seem chunkier than those further out; it becomes psychologically
preferable to \e{look askew} at objects, using the eye muscles
together with the increased outer resolution to get a better view.
This property is worrying, and could, potentially, cause eye-strain.
To avoid this side-effect, we would need to have the
\e{entire} display of such a high resolution that even the central
pixel (the worst one) is below foveal resolution.
We earlier showed that a $512$-pixel-wide planar display produces
a central pixel angular length of about $1.27\degrees$.
Foveal resolution, however, is about 1~arc-minute at best---about what
one can get from a Super-VGA display at normal viewing distance.
To achieve this,
we could simply ``scale down'' our pixel size, providing 
a greater number of total pixels to maintain the full-field-of-vision
coverage; 
performing this calculation tells us that we would need 
\[
512\times\f{1.27}{1/60}\approx39000
\]
pixels in each direction for this purpose.
No problems---just give me a $39000\times39000$ head-mountable display
device, and I'll shake your hand (if I can find it, behind the
4.5~GB of display memory necessary for even 24-bit colour, with
no $z$-buffering or \Galn\ \anti ing\ldots).
And all this just to get roughly Super-VGA resolution\ldots do you
think you have a market?

The clear message from this line of thought is the following:
\e{planar-display rendering has no future in \VR}.
Assumption of a planar view-plane is, in traditional computer graphics
applications, convenient: 
the perspective view of a line is again a line; the perspective view
of a polygon is also a polygon.
Everything is cosy for algorithm-inventors; procedures can be optimised
to a great extent.
But its performance is simply unacceptable for wrap-around display
devices.
We must farewell an old friend.

How, then, are we to proceed?
Clearly, designers of current \VR\ systems have not been hamstrung by
this problem: there must be a good trick or two that makes everything
reasonable again, surely?
And, of course, there is: one must map (using optics)
the rectangular display devices that electronics manufacturers
produce in an \e{intelligent} way onto the solid angle of the human
visual field.
One must not, however, be fooled into thinking that any
sophisticated amount of optics will ever ``solve''
the problem by itself.
The mapping will also transform the \e{logical} rectangular pixel grid
that the \e{rasterisation software} uses, in such a way that (for
example) polygons in physical space will \e{not} be polygons on the
transformed display device.
(The only way to have a polygon stay a polygon is to have a planar
display, which we have already rejected.)

Let us now consider how we should like to map the physical display
device onto the eye's field of vision.
Our orange helps us here.
All points on the surface within the red circle should be at
\e{maximum solid-angle resolution}, as our foveal vision can point in
any direction inside this circle.
However, look at the skewed-strip of solid angle that this leaves
behind (\ie\ those solid angles that are seen in peripheral vision,
but not in foveal vision; the area between the red and blue circles): 
it is not overwhelming.
Would there be much advantage in inventing a transformation that left 
a \e{lower}-resolution coverage in this out-lying area, to simulate
more closely what we can actually perceive?
Perhaps; but it is the opinion of the author that simply removing
the distortions of planar display viewing should be one's first
consideration.
Let us therefore simply aim to achieve a \e{uniform solid-angle 
resolution in the entire field of view}.
Note carefully that this is \e{not} at all the same
as simply viewing a planar display directly that itself has
a uniform resolution across its planar surface---as has been
convincingly illustrated above.
Rather, we must obtain some sort of smooth (and, hopefully, simple) 
\e{mapping} of the one to the
other, which we will implement physically with optics, and which must
be taken account mathematically in the rendering software.

How, then, does one map a planar surface onto a spherical one?
Cartographers have, of course, been dealing with this problem
for centuries, although usually 
with regard to the converse: how do you
map the surface of the earth onto a flat piece of paper?
Of the hundreds of cartographical projections that have been devised
over the years,
we can restrict our choices immediately, with some simple considerations.
Firstly, we want the mapping to be smooth everywhere, out to the
limits of the field of view, so that we can implement it with optical
devices;
thus, ``interrupted'' projections (\eg\ those
with slice-marks that allow the continents to be shown with less
distortion at the expense of the oceans) can be eliminated immediately.
Secondly, we want the projection to be an \e{equal-area} one: equal
areas on the sphere should map to equal areas on the plane.
Why?
Because this will then mean that \e{each} pixel on the planar display
device will map to the \e{same} area of solid angle---precisely what
we have decided that we want to achieve.

OK, then, the cartographers can provide us with a large choice of
uninterrupted, equal-area projections.
What's the catch? 
This seems too easy!
The catch is, of course, that while all of the square pixels 
\e{will} map to
an equal area of solid angle, they will \e{not} (and, indeed, 
mathematically \e{cannot}) all map to square-shapes.
Rather, all but a small subset of these pixels will be distorted
into diamond-like or rectangular-like shapes (the suffix \e{-like} being
used here because the definition of these quantities on a curved surface
is a little complicated; but for small objects like pixels
one can always take
the surface to be locally flat).
Now, if our rendering software were to think that it was still 
rendering for a \e{planar} device, this distortion would indeed
be real: objects would be seen by the \VR\ participant to be warped 
and twisted, and not really what would be seen in normal perspective
vision.
However, if we have suitably briefed our rendering software about
the transformation, then the image \e{can} be rendered free of distortion,
by simply ``undoing'' the effect of the mapping.
Again we ask: what \e{is} the catch?

The catch---albeit a more subtle and less severe one now---is that the
directional
resolution of the device will not be homogeneous or isotropic.
For example, if a portion of solid angle is covered by a stretched-out
rectangular pixel, 
the local resolution in the direction of the shorter dimension is 
higher then that in the longer direction.
We have, however, insisted on an equal-area projection; therefore,
the ``lengths'' of the long and short dimensions must multiply together
to the same product as any other pixel.
This means that the \e{geometric mean} of the 
local resolutions in each of 
these two directions is a constant, independent of where we are in the
solid angle almost-hemisphere, \ie\ the square-root of \{the 
pixels-per-radian in one direction\} times \{the pixels-per-radian in the
orthogonal direction\}, evaluated at \e{any} angle of our 
field of view, will be some constant number, 
that characterises the resolution quality
of our display system.
This is what an equal-area projection gives us.

OK then, we ask, of all the uninterrupted equal-area projections that
the cartographers have devised, is there any one that does \e{not}
stretch shapes of pixels in this way?
The cartographer's answer is, of course, no: 
that is the nature of mapping between
surfaces of different intrinsic curvature; you can only get rid of
some problems, but not all of them.
However, while there is \e{no} way to obtain a distortion-free projection,
there are, in fact, an \e{infinite} number of ways we could implement
simply an equal-area projection.
To see that, it is sufficient to consider the \coord s of the
physical planar display device,
which we shall call $X$, $Y$ and $Z$ (where $Z$-buffering is employed),
as functions of the spherical \coord s $r$, $\th$ and $\ph$.
For reasons that will become clear, we define the spherical
\coord s in terms of a \e{physical} (virtual-world)
Cartesian three-space set of \coord s, $u$, $v$ and $w$, in
a slightly non-standard way, namely
\beqnarr{SphericalCoords}
\tb\tb u=r\cos\th\sin\ph, \nline
\tb\tb v=r\sin\th, \nline
\tb\tb w=r\cos\th\cos\ph.
\eeqnarr
(We will specify precisely the meaning of the $(u,v,w)$ \coord\ system
shortly.)
Now, the equal-area criterion states that the area that an
infinitesimally small object covers in $(X,Y)$ 
space should be the \e{same} as that
contained by the solid-angle area of its mapping in $(\th,\ph)$ 
space, up to some constant that is independent of position.
The solid angle of an infinitesimally small object of extent 
$(dr,d\th,d\ph)$ centred on the position $(r,\th,\ph)$ in spherical
\coord s in given by
\[
d\Om=\cos\th\,d\th\,d\ph
\]
(and is of course independent of $r$ or $dr$);
thus, using the Jacobian of the transformation between $(X,Y)$ and
$(\th,\ph)$, namely,
\[
\f{\pard\brac{X(\th,\ph),Y(\th,\ph)}}{\pard\brac{\th,\ph}}\id
  \modsign{\,\det\!\paren{
    \begin{array}{cc}
      \f{\pard X(\th,\ph)}{\pard\th} \hspace{0.2cm} 
      \f{\pard X(\th,\ph)}{\pard\ph} 
    \vspace{0.2cm} \\
      \f{\pard Y(\th,\ph)}{\pard\th} \hspace{0.2cm} 
      \f{\pard Y(\th,\ph)}{\pard\ph}
   \end{array}
  }}
\]
(which relates infinitesimal
areas in $(X,Y)$ space to their counterparts 
in $(\th,\ph)$ space), the equal-area criterion can be written
mathematically as
\beqn{EqualAreaCrit}
\modsign{\f{1}{\cos\th}\braces{\f{\pard X(\th,\ph)}{\pard\th}
  \f{\pard Y(\th,\ph)}{\pard\ph}-\f{\pard X(\th,\ph)}{\pard\ph}
  \f{\pard Y(\th,\ph)}{\pard\th}}}=\txt{const.}
\eeqn
This is, of course, 
only \e{one} relation between the two functions $X(\th,\ph)$ and
$Y(\th,\ph)$; it is for this reason that we are still free to choose,
from an infinite number of projections, the one that we would like to
use.

Let us, therefore, consider again the human side of the equation.
Our visual systems have evolved in an environment 
somewhat unrepresentative
of the Universe as a whole: 
gravity pins us to the surface of the planet, and drags everything
not otherwise held up downwards; our evolved
anatomy requires that we are, most of the time, in the same
``upright'' position against gravity;
our primary sources of
illumination (Sun, Moon, planets) were always in the ``up'' direction.
It is therefore not surprising that our visual senses do not
interpret the three spatial directions in the same way.
In fact, we often tend to view things in a somewhat ``$2\half$-dimensional'' 
way: the two horizontal directions are treated symmetrically, but
logically distinct from the vertical direction; we effectively
``slice up'' the world, in our heads, into horizontal planes.

Consider, furthermore, the motion of our head and eyes.
The muscles in our eyes are attached to pull in either the horizontal
or vertical directions; of course, two may pull at once, providing
a ``diagonal'' motion of the eyeball, but we most frequently look
either up--down \e{or} left--right, as a rough rule.
Furthermore, our necks have been designed to allow easy rotation
around the vertical axis (to look around) and an axis parallel to
our shoulders (to look up or down); we can also cock our heads
of course, but this is less often used; and we can combine all three
rotations together, although this may be a bone-creaking (and,
according to current medical science, dangerous) pastime.

Thus, our \e{primary} modes of visual movement are left--right (in
which we expect to see a planar-symmetrical world) and up--down
(to effectively ``scan along'' the planes).
Although this is a terribly simplified description, it gives us
enough information to make at least a reasonable choice of equal-area
projection for \VR\ use.
Consider building up such a mapping pixel-by-pixel.
Let us start in the \e{centre} of our almost-hemispherical
solid area of vision
(\ie\ the Makassar Strait on our orange), which is close 
to---but not \coin cident with---the
direction of straight-ahead view (Singapore on our orange).
Imagine that we place a ``central pixel'' there, 
\ie\ in the Makassar Strait.
(By ``placing a pixel'' we mean placing the mapping of the 
corresponding square pixel of the planar display.)
In accordance with our horizontal-importance philosophy, let us simply
continue to place pixels around the Equator, side by side, so
that there is \e{an equal, undistorted density of pixels} around it.
This is what our participant would see if looking directly left or
right from the central viewing position; it would seem 
(at least along that line) nice and
regular.
(We need only stack them as far as $80\degrees$ in either direction, of
course, but it will be convenient to carry this around a full
$90\degrees$ to cover a full hemisphere, to simplify the description.)
On the $(X,Y)$ display device, this corresponds to using the pixels along
the $X$-axis for the Equator, \e{with equal longitudinal spacing} 
(which was \e{not} the case for the flat planar display).

Now let us do the same thing in the \e{vertical} direction, stacking
up a single-pixel wide column starting at the Makassar Strait, and
heading towards the North Pole; and similarly towards the South Pole.
(Again, we can stop short $10\degrees$ from either Pole in practice,
but consider for the moment continuing right to the Poles.)
This corresponds to the pixels along the $Y$-axis of the physical
planar display device, \e{at equally spaced latitudes}; such
equal angular spacing was
\e{also} not true for the planar device.
The
Equator and Central Meridian 
have therefore been mapped linearly to the $X$ and $Y$
axes respectively; pixels along these lines are \e{distortion-free}.

Can we continue to place any more pixels in a distortion-free way?
Unfortunately, we cannot; we have used up all of our choices of 
distortion-free lines.
How then do we proceed now?
Let us try, at least, to maintain our philosophy of splitting the
field of view into \e{horizontal planes}.
Consider going around from Makassar Strait, placing square pixels,
as best we can, in a ``second row'' above the row at the Equator
(and, symmetrically, a row below it also). 
This will not, of course, be $100\%$ successful: 
the curvature of the surface means there must 
be gaps, but let us try as best we can.
How many pixels will be needed in this row?
Well, to compute this roughly, let us approximate the field of
view, for the moment, by a complete hemisphere; we can cut off the
extra bits later.
Placing a second row of pixels on top of the first amounts to traversing
the globe at a \e{constant latitude}, \ie\ travelling along a Parallel
of latitude.
This Parallel
is \e{not} the shortest surface
distance between two points, of course; it is
rather the intersection between the spherical surface and a \e{horizontal
plane}---precisely the object we are trying to emulate.
Now, how long, in terms of surface distance, is this Parallel
that our pixels are traversing?
Well, some simple solid geometry and
trigonometry shows that the length of the
(circular) perimeter of 
the Parallel of latitude $\th$ is simply $C\cos\th$, where $C$ is the
circumference of the sphere.
Thus, in some sort of ``average'' way, the number of pixels we need
for the second row of pixels will be 
$\cos\th$ times smaller than for the
Equator, if we are looking at a full hemisphere of view.
This corresponds, on the $(X,Y)$ device, to only extending 
a distance roughly $\cos\th$ \e{shorter} in the $X$ direction for the 
horizontal line
of pixels cutting the $Y$-axis at the value $Y=1$ pixel, than 
was the case for the Equatorial pixels (which mapped to the line $Y=0$).
It is clear that the shape we are filling up on the $(X,Y)$ device is
\e{not} a rectangle; this point will be returned to shortly.

We can now repeat the above procedure again and again, 
placing a new row of pixels 
(as best will fit)
above and below the Equator at successively polar latitudes; eventually,
we reach the Poles, and only need a single pixel on each Pole itself.
We have now completely covered the 
entire forward hemisphere of field of view, with pixels
smoothly mapped from a regular physical display, according to our 
chosen design principles. 
What is the precise mathematical relationship between $(X,Y)$ and
$(\th,\ph)$? 
It is clear that the relations are simply
\beqnarr{XYFromThetaPhi}
\tb\tb X=\f{N_\txt{pixels}}{\p}\ph\cos\th,  \nline
\tb\tb Y=\f{N_\txt{pixels}}{\p}\th,
\eeqnarr
where $N_\txt{pixels}$ is simply the number of pixels the display device
possesses in the shorter of its dimensions.
The relation for $Y$ follows immediately from our construction:
lines parallel to the $X$-axis simply correspond to Parallels of 
latitude; they are equally spaced (as we have filled the latitudes
with pixels one plane at a time); $\th$ is simply the latitude in
spherical \coord s (which should always be measured
in \e{radians}, not degrees,
\e{except} perhaps when computing a final ``quotable'' number); 
and the scaling factor $N_\txt{pixels}/\p$ simply 
ensures that the poles are mapped to the edges of the display
device: $Y(\th=\pm\p/2)=\pm\half N_\txt{pixels}$.
(To later ``slice off'' the small part of the hemisphere not visible,
we will simply increase the \e{effective} $N_\txt{pixels}$
of the device by the factor $90/80=1.125$, rather than trying to
change
relations \eq{XYFromThetaPhi} themselves; this practice
will maintain
conformity between different designers who choose a slightly
different maximum field of view---say, $75\degrees$ rather than 
$80\degrees$.)
The relation for $X$ follows by noting that the $\ph$-th Meridian 
cuts the $\th$-th Parallel a surface distance $R\ph\cos\th$ 
from the Central Meridian (where $R$ is the radius of the sphere,
and the distance is measured along the Parallel itself);
since we have stacked pixels along the Parallel, this surface
distance measures $X$ on the display device;
and the normalisation constant ensures that, at the Equator,
$X(\th=0,\ph=\pm\p/2)=\pm\half N_\txt{pixels}$.

Now, our method above \e{should} have produced an equal-area
mapping---after all, we built it up by notionally placing display
pixels directly on the curved surface!
But let us nevertheless verify mathematically that the
equal-area criterion, equation~\eq{EqualAreaCrit}, \e{is}
indeed satisfied by the transformation equations~\eq{XYFromThetaPhi}.
Clearly, on partial-differentiating equations~\eq{XYFromThetaPhi}, 
we obtain $\pard X/\pard\th=-\g\ph\sin\th$, 
$\pard X/\pard\ph=\g\cos\th$,
$\pard Y/\pard\th=\g$, and    
$\pard Y/\pard\ph=0$, where 
we have defined the convenient constant $\g\id N_\txt{pixels}/\p$.
The equal-area criterion, \eq{EqualAreaCrit}, then becomes
\[
\modsign{\f{1}{\cos\th}\braces{-\g\ph\sin\th\cdot0-\g\cos\th\cdot\g
  }}\id\g^2=\txt{const.},
\]
which is, indeed, satisfied.
Thus, we have not made any fundamental
blunders in our construction.

The transformation \eq{XYFromThetaPhi} is a relatively simple one.
The forward hemisphere of solid angle of view is mapped to a portion
of the display device whose shape is simply a \e{pair of back-to-back
sinusoids} about the $Y$-axis, as may be verified by plotting all of the
points in the $(X,Y)$ plane
corresponding to the edge of the hemisphere, namely,
those corresponding to $\ph=\pm90\degrees$, for all $\th$.
And, as could have been predicted, our cartographer colleagues have used
this projection for a long time: it is known as the \e{Sinusoidal} or
\e{Sanson--Flamsteed projection}.
Now, considering that the term ``sinusoidal'' is used relatively
abundantly in \VR\ applications, we shall, for sanity, actively
avoid it in this particular context,
and instead give Messrs Sanson and Flamsteed their due credit whenever 
discussing this projection.

The astute reader may, by now, have asked the question, ``Isn't it
silly to just use a sinusoidal swath of our display device---we're
wasting more than 36\% of the pixels on the display!'' (or 
possibly more,
if the physical device is rectangular, rather than square).
Such wastage does, indeed, on the surface of it,
seem like a bad idea.
However, it is necessary to recall that we are here balancing between
various evils.
We have now, indeed, corrected for the anomalously bad central
resolution of a planar device: resolution is constant over all solid
angle, rather than bad where we want it and good where we don't;
the central pixel \e{will} now be four times smaller in each direction
(or sixteen times smaller in area) than before, as our rough ``stretch
the display around'' estimate suggested should be the case.
We are further \e{insisting} on full coverage of the field of view 
as a fundamental \VR\ design
principle; 
cutting off ``edges'' of solid angle to squeeze use out of the extra 
$36\%$ of unused pixels (which requires a slice at roughly
$55\degrees$ in each direction, rather than the full 
$75\degrees$--$80\degrees$ that we want)
is likely to cause a greater negative
psychological effect than those extra pixels could possibly portray
by slightly increasing the linear resolution (only
by about $40\%$, in fact) of the remaining, mutilated solid angle.

There is, however, a more subtle reason why, in fact, not using $36\%$
of the display can be a \e{good} thing.
Consider a consumer electronics manufacturer fabricating small, light,
colour LCD displays, for such objects as camcorders.
Great advances are being made in this field regularly; it is likely
that ever more sophisticated displays will be the norm for some time.
Consider what happens when a \e{single LCD pixel} is faulty upon
manufacture (more likely with new, ground-breaking technology): 
the device is of no further commerical use, because
all consumer applications for small displays need the full rectangular
area.
There is, however, roughly a $36\%$ chance that this faulty pixel
falls \e{outside} the area necessary for a \VR\ head-mounted display---and
is thus again a viable device!
If the electronics manufacturer is simultaneously developing commerical
\VR\ systems, here is a source of essentially free LCD displays: the
overall bottom line of the corporation is improved.
Alternatively, other \VR\ hardware manufacturers may negotiate a
reasonable price with the manufacturer for purchasing these 
faulty displays; this both cuts costs of the 
\VR\ display hardware (especially when using the newest, most
expensive high-resolution display devices---that will most likely
have the highest pixel-fault rate), as well as providing income to the
manufacturer for devices that would otherwise have been scrapped.
Of course, this mutually beneficial arrangement relies on the
supply-and-demand fact
that the consumer market for small LCD display devices is huge
compared to that of the current \VR\ industry, and, as such, will not
remain so lucrative when \VR\ hardware outstrips the camcorder
market; nevertheless, it may be a 
powerful fillip to the \VR\ industry in the fledgling form
that it is in today.

It is now necessary to consider the full transformation from the 
physical space of the virtual world, measured in the Cartesian
\coord\ system $(x,y,z)$, to the space of the physical planar display
device, $(X,Y,Z)$ (where $Z$ will be now be used for the $Z$-buffering
of the physical display and its video controller).
In fact, the only people who will ever care about the intermediate
spherical \coord\ system, $(r,\th,\ph)$, are the designers of the
optical lensing systems necessary to bring the light from the
planar display into the eye at the appropriate angles.
(It should be noted, in passing, that even a \e{reasonably} accurate
physical replication of the mapping \eq{XYFromThetaPhi} in optical
devices would be sufficient to convince the viewer of reality;
however, considering the time-honoured and highly advanced status of
field of optics [Hubble Space Telescope notwithstanding], 
there is no doubt that the optics will not be a serious problem.)

What, then, is the precise relationship between $(X,Y,Z)$ space and
$(x,y,z)$ space?
To determine this, we need to have defined 
conventions for the (physically fixed)
$(x,y,z)$ axes themselves in the first place.
But axes that are \e{fixed} in space are not very convenient
for \e{head-mounted} systems;
let us, therefore, define a \e{second} set of Cartesian axes 
$(u,v,w)$, whose (linear) transformation from the $(X,Y,Z)$ space
consists of the translation 
and rotation from the participant's head position to the fixed 
\coord\ system.

At this point, however, we note
a considerable complication: the line of vertical symmetry
in the $(X,Y)$ plane has (necessarily) 
been taken to be through the Makassar 
Strait direction---which is in a \e{different} 
physical direction for each eye.
Therefore, let us define, not one, but \e{two} new intermediate
sets of Cartesian axes in (virtual) physical space,
$(u_L,v_L,w_L)$ and $(u_R,v_R,w_R)$, with the following properties:
$(u_L,v_L,w_L)$ is used for the left eye, $(u_R,v_R,w_R)$ for the right;
the origins of these \coord\ systems lie at the effective optical
centres of the respective eyes of the participant;
when the head is vertically fixed, the $u_L$--$v_L$ plane 
is normal to the line connecting the centre of the left eye and its
Makassar-Strait direction; $w_L$ measures positions in this normal
direction, increasing towards the \e{rear} of the participant (so that
all visible $w_L$ values are in fact negative---chosen for historical
reasons); $u_L$ measures ``horizontally'' in the $u_L$--$v_L$ plane with  
respect to the viewer's head, increasing to the right; 
$v_L$ measures ``vertically'' in the $u_L$--$v_L$ plane in the same
sense, increasing as one moves upwards; and the $(u_R,v_R,w_R)$ 
axes are defined in exactly the same way but with respect to the
right eye's position and Makassar Strait direction.

With these conventions, the $(u_L,v_L,w_L)$ and $(u_R,v_R,w_R)$
\coord\ systems are Cartesian systems in (virtual) physical three-space,
whilst still being rigidly attached to the participant's head (and,
as a consequence, the display devices themselves).
We can now use the definitions \eq{SphericalCoords} directly,
for each eye separately, namely
\beqnarr{UVWFromSpherical}
\tb\tb u_e=r_e\cos\th_e\sin\ph_e, \nline
\tb\tb v_e=r_e\sin\th_e, \nline
\tb\tb w_e=r_e\cos\th_e\cos\ph_e,
\eeqnarr
where $e$, the \e{eye index}, is equal to $L$ or $R$ as appropriate.
Similarly, we need an $(X,Y,Z)$ \coord\ system for each eye, which
will thus be subscripted by $e=L$ or $R$ as appropriate.

There is still, however, 
the question of deciding what functional form $Z_e$
will take, in terms of the spherical \coord s $(r_e,\th_e,\ph_e)$.
It should be clear that this should only be a function 
of $r_e$, so that 
the $Z_e$-buffer of the physical display device 
\e{does} actually refer to the physical distance from the (virtual) object
in question to the $e$-th eye. 
It will prove convenient to define this distance \e{reciprocally}, so that
\[
Z_e=\f{\be}{r_e},
\]
where $\be$ is a constant chosen so that the (usually integral)
values of the $Z_e$-buffer are best distributed for the objects in
the virtual world in question; for instance, objects closer than
about 50~mm in distance cannot be focused anyway,
so the maximum $Z_e$ may as well be clamped to this value.
This reciprocal definition carries with it the great advantage that
it does not break down for objects very far away (although it may,
of course, round off sufficiently far distances to $Z=0$); and
it takes account of the fact that a given change in distance is, 
in fact, more
important visually the \e{closer} that that distance is to the 
observer.
We then have, using \eq{XYFromThetaPhi},
\beqnarr{XYZFromSpherical}
X_e\tb=\tb\f{N_\txt{pixels}}{\p}\ph_e\cos\th_e,  \nline
Y_e\tb=\tb\f{N_\txt{pixels}}{\p}\th_e,  \nline  
Z_e\tb=\tb\f{\be}{r_e}.
\eeqnarr
We can now use the relations \eq{UVWFromSpherical} and 
\eq{XYZFromSpherical} to eliminate the intermediate spherical \coord s
$(r_e,\th_e,\ph_e)$ completely.
Inversion of relations \eq{UVWFromSpherical} yields
\beqnarr{SphericalFromUVW}
r_e\tb=\tb\sqrt{u_e^2+v_e^2+w_e^2}, \nline
\th_e\tb=\tb\arsin\!\paren{\f{v_e}{\sqrt{u_e^2+v_e^2+w_e^2}}}, \nline  
\ph_e\tb=\tb\artan\!\paren{\f{u_e}{w_e}}.  
\eeqnarr
Inserting these results into \eq{XYZFromSpherical}, and noting the
identity 
\[
\cos\!\brac{\arsin(a)}\id\sqrt{1-a^2}, 
\]
we thus obtain the
desired one-step transformations
\beqnarr{XYZFromUVW}
X_e\tb=\tb\f{N_\txt{pixels}}{\p}\sqrt{\f{u_e^2+w_e^2}{u_e^2+v_e^2+w_e^2}}
  \cdot\artan\!\paren{\f{u_e}{w_e}}, \nline
Y_e\tb=\tb\f{N_\txt{pixels}}{\p}\,\arsin\!
  \paren{\f{v_e}{\sqrt{u_e^2+v_e^2+w_e^2}}}, \nline
Z_e\tb=\tb\f{\be}{\sqrt{u_e^2+v_e^2+w_e^2}}.
\eeqnarr
Of course, in practice, there are not as many operations involved
here as there appear at first sight: many quantities are used more
than once, but only need to be computed once.
Nevertheless, this transformation from physical space to display
space is much more computationally expensive than is the case for
conventional planar computer graphics---an unavoidable price that
must be paid, however, if equal rendering time and display resolution
are to be devoted to all equivalent portions of solid area in the field of
view.

Finally,
linking the transformations \eq{XYZFromUVW} 
to the virtual-world \e{fixed}
physical \coord\ system, $(x,y,z)$, requires a transformation from the
Ma\-kas\-sar-cen\-tred \coord\ system $(u_e,v_e,w_e)$.
This transformation is, however, a standard one, consisting of
an arbitrary three-displacement coupled with the three Euler angles
specifying the rotational orientation of the head; as there is
nothing new to be added to this transformation, we shall not go into
further explicit and voluminous details here.
By carefully taking the temporal derivatives of these transformations,
one may obtain the relationships between the \e{true} physical motional 
derivatives of
objects in virtual three-space, and the corresponding time derivatives
of the \e{apparent} motion---\ie, the derivatives of the motion as 
described
in terms of the physical display device \coord s, $(X_e,Y_e,Z_e)$; 
this information is needed for \Galn\ \anti ing to be implemented.
Again, these formulas may simply
be obtained by direct differentiation;
we shall not derive them here.

A final concern for the use of the Sanson--Flamsteed projection 
(or, indeed any other projection)
in \VR\ is to devise efficient rendering algorithms for everyday
primitives such as lines and polygons.
Performing such algorithmic 
optimisation is an artform; the author would not presume
to intrude on this intricate field.
However, a rough idea of how such non-linear mappings of lines
and polygons might be handled is to note that \e{sufficiently small}
sections of such objects can always be reasonably approximated
by lines, parabolas, cubics, and so on.
A detailed investigation of the most appropriate and efficient
approximations, for the various parts of the solid angle mapped by
the projection in question (which would, incidentally, become
almost as geographically ``unique'', in the minds of algorithmicists, 
as places on the real earth),
would only need be done once, in the
research phase, for a given practical range of display
resolutions; rendering algorithms could then be implemented that
have this information either hard-wired or hard-coded.
It may well be useful to slice long lines and polygons into smaller
components, so that each component can be handled to pixel-resolution
approximation accurately, yet simply.
All in all, the problems of projective, perspective rendering
are not insurmountable; they simply require sufficient
(preferably non-proprietary-restricted) research and development.

If you thought the science of
computer graphics was a little warped before,
then you ain't seen nothing yet.

\newssect{LocalUpdate}{Local-Update Display Philosophy}
Having diverted ourselves for a brief and fruitful (sorry)
interlude on head-mounted
display devices, we now return to the principal topic of this
\typeofdoc: \Galn\ \anti ing, and its implementation in practical
systems.
We shall, in this final section,
critically investigate the \e{basic philosophy} underlying
current \VR\ image generation---which was accepted unquestioningly
in the minimal implementation described in 
section~\sect{MinimalImplementation}, but which we must 
expect to be \e{itself}
fundamentally influenced by the use of \Galn\ \anti ing.

Traditionally, perspective 3-dimensional computer graphics has been
performed on the ``clean slate'' principle: one erases the frame
buffer, draws the image with whatever sophistication is 
summonable from the unfathomable depths of ingenuity,
and then projects the result onto a physical device, 
evoking immediate
spontaneous applause and calls of ``Encore! Encore!''.
This approach has, to date, 
been carried across largely unmodified into the \VR\ environment,
but with the added imperative: get the bloody image out within 
100~milliseconds!
This is a particularly important yet onerous requirement: if the 
\VR\ participant is continually moving around (the general case), the view
is continually changing, and it must be updated regularly if the
participant is to function effectively in the virtual world at all.

With \Galn\ \anti ing, however, our image generation philosophy may be
profitably shifted a few pixels askew.
As outlined in section~\sect{MinimalImplementation},
a \e{Galilean} update of the image on the display gives the video
controller
sufficient information to move that object reasonably
accurately for a certain period of time.
This has at least one vitally important software ramification:
\e{the display processor no longer needs to worry about churning
out complete images simply to simulate the effect of motion};
to a greater or lesser extent, objects will ``move themselves''
around on the display, ``unsupervised'' by the display processor.
This suggests that the whole philosophy of the image generation 
procedure be subtly changed, but changed all the way to its roots:
\e{Only objects whose self-propagating
displayed images are significantly out-of-date should be updated}.
Put another way, we can now organise the image generation procedure
in the same way that we (usually!) organise our own lives:
urgent tasks should be done NOW; important tasks should be done \e{soon};
to-do-list tasks should be picked off when other things aren't so
hectic.

How, then, would one go about implementing such a philosophy, if one
were building a brand-new \VR\ system from the ground up?
Firstly, the display-processor--video-controller interface should be
designed so that updates of only \e{portions} of the display can be
cleanly \e{grafted} onto the existing self-propagating image; in other
words, \e{local updates} must be supported.
Secondly, this interface between the display processor and the
video controller---and, indeed, the whole software side of the
image-generation process---must have a reliable method of 
ensuring \e{timing and synchronisation}.
Thirdly, the display processor must be 
redesigned for \e{reliable rendering} of
local-update views: the laxity of the conventional
``clean the slate, draw the lot'' computer graphics
philosophy must be weeded out. 
Fourthly, it would be most useful if updates for 
\e{simple motions of the
participant herself} could be catered for automatically, 
by specialised additions to the video controller hardware, so that
entire views need not be regenerated simply because the participant
has started jerking around a bit.
Fifthly, some sort of \e{caching} should be employed on the
\Galn\ pixelated display image, to further reduce strain on the
process of generating fresh images.
Finally, and ultimately most challengingly, the core \VR\ operating
system, and the applications it supports,
must be structured to fully exploit this
new hardware philosophy maximally.

Let us deal with these aspects in turn.
We shall not, in the following discussion, proscribe
solutions to these problems in excessive technical detail, as
performing this task optimally can only be done by the \VR\ designer
of each particular implementation.

First on our list of requirements is that \e{local updates}
of the display be possible.
To illustrate the general problem most clearly, imagine the
following scenario:
A \VR\ participant is walking down the footpath of
a virtual street, past virtual buildings, looking down
the alley-ways between them as she goes.
Imagine that she is walking down the left-hand footpath of 
the street, \ie\ on the same side of the road as a car would
(in civilised countries, at least).
Now imagine that she is passing the front door of a large, red-brick
building.
She does not enter, however; rather, she continues to stroll past.
As she approaches the edge of the building's facade, she starts to
turn her head to the left,
in anticipation of looking down the alley-way next to the building.
At the precise instant the edge of the facade passes her\ldots press
an imaginary ``pause'' button, and consider the session to date from  
the \VR\ system's point of view. 

Clearly, as our participant was passing the front 
of the building, its facade
was slipping past her view smoothly; its apparent motion was not 
very violent at all; we could quite easily redraw most of it at
a fairly leisurely rate, relying on \Galn\ \anti ing to make it
move smoothly---and the extra time could be used to apply a
particularly convincing set of Victorian period textures to the
surfaces of the building (which would propagate along with the
surfaces they are moulded to).
We are, of course, 
here relying on the relatively mundane motion of the virtual objects
in view as a \e{realism lever}: these objects 
can be rendered less frequently,
but more realistically.
And this is, indeed, precisely what one \e{does} want from the
system: a casual
stroll is usually a good opportunity to ``take a good look at the
scenery'' (although mankind must pray that, for the good of all,
\noone\ ever creates a Virtual Melbourne Weather module).

Now consider what happens at the point in time at which
we freeze-framed our
session.
Our participant is just about to look down an alley-way: she doesn't 
know what is down there; passing by the edge of the facade will
let her have a look.
The only problem is that
\e{the video-controller doesn't know what's down there
either}: the facade of the building Galileanly moves out of 
the way, leaving\ldots well, leaving nothing; the video
controller, for want of something better, 
falls back to $\Gal{0}$ technology, and simply leaves the debris of
the \e{previous} frame on the display---hoping (if a
video controller is capable of such emotions) that the display
controller gets its proverbial into gear and gives it a picture to show 
quick smart.

So what \e{should} be visible when our participant looks down the alley?
Well, a whole gaggle of objects may just be coming into view: 
the side wall of the building; the windows in the building;
the pot-plants on the windowsills; 
a Japanese colleague of our participant who has virtu-commuted 
to her virtual building
for a later meeting, who is right now sipping a coffee
and happily waving out the window at her;
and so on.
But none of this is yet known to the video controller: a massive
number of objects simply do not exist in the frame buffer at all.
We shall say that these objects \e{have gone information-critical}:
they are Priority One: something needs to be rendered, and 
rendered NOW.

How does the \VR\ system carry out this task?
To answer this, it is necessary to examine, in a high-level form,
how the entire system will work as a whole.
To see what would occur in a \e{well-designed} system
at the moment we have freeze-framed, 
we need to wind back the clock by about a quarter of a second.
At that earlier time, the operating system, constantly
projecting the participant's trajectory forward in time, had 
realised that the 
right side wall of building \#147 would probably go critical in
approximately 275 milliseconds.
It immediately instigated a Critical Warning sequence, informing all
objects in the system that they may shortly lose a significant 
fraction of display 
processing power, and should take immediate action to ensure that
their visual images are put into stable, conservative motion
as soon as possible.
The right wall of building \#147 is informed of its Critical Warning
status, as well as the amount of extra processing power allocated to it;
the wall, in turn,
proceeds to carry out its pre-critical tasks:
a careful monitoring of the participant's extrapolated motion 
and critical time estimate;
a determination of
just precisely which direction of approach has triggered this
Critical Warning; a computation of estimates of the trajectories of the key
control points of the wall and its associated objects; and so on.
By the time 150~milliseconds have passed, most objects in the system
have stabilised their image trajectories.
The right wall of building
\#147 has decided that it will now definitely go critical in
108~milliseconds, and requests the operating system
for Critical Response
status.
The operating system concurs, and informs the wall that there are
no other objects undergoing Critical Reponse, and only 
one other object on low-priority Warning status;
massive display-processing power is authorised for the wall's use,
distributed over the next 500 milliseconds.
The wall immediately refers to the adaptive system performance
table, and makes a conservative estimate of how much 
visual information
about the wall and associated objects it can generate in less
than 108 milliseconds.
It decides that it can comfortably render the gross features of all
visible objects with cosine-shaded polygons; and immediately
proceeds to instruct the display controller with the relevant
information---not the positions, velocities, accelerations, colours
and colour derivatives
of the objects as they are \e{now}, but rather where they will
be \e{at the critical time}.
It time-stamps this image composition information with the
projected critical time, which is by this time 93 milliseconds into the
future; and then goes on to consider how best
it can use its authorised \e{post-critical} resources to render a
more realistic view.
While it is doing so---and while the other objects in the system monitor
their own status, mindful of the Critical Response in progress---the
critical time arrives.
The pixelated wall---with simple shaded polygons for windows,
windowsills, pot-plants, colleagues---which the display 
processor completed rendering about
25 milliseconds ago,
is instantaneously grafted onto the building by the video 
controller;
the participant looks around the corner and sees\ldots a wall!
200 milliseconds later---just as she is getting a good look---the
video controller grafts on a new rendering of the wall and its
associated objects: important fine details are now present; objects are now
Gouraud-shaded; her colleague is recognisable; the coffee cup has
a handle.
And so she strolls on\ldots what an uneventful and 
relaxing day this is, she thinks.

Let us now return to our original question: namely, what are the
additional
features our display processor and video controller need to 
possess to make
the above scenario possible.
Clearly, we need to have a way of \e{grafting} a galpixmap frame
buffer onto the existing image being propagated by the video
controller.
This is a similar problem (as, indeed, much of the above scenario is)
to that encountered in \e{windowed} operating systems.
There, however, all objects to be grafted are simply rectangles, 
or portions
thereof. 
In such a situation, one can code very efficiently the shape of the
area to be grafted by specifying a coded sequence of corner-points.
However, our \VR\ scenario is much more complex: how do you encode the
shape of a wall?
The answer is: you don't; rather, you (or, more precisely,
your display processor)
uses the following more subtle procedure:
Firstly, the current frame buffer that has been allocated to the
display processor for rendering purposes is cleared.
How do we want to ``clear'' it?
Simple: set the \e{debris indicator} of each galpixel in the frame
buffer to be true.
Secondly, the display processor proceeds to render 
only those objects that it is instructed to---clearing the
debris indicators of the galpixels it writes; leaving the other
debris indicators alone.
Thirdly, when the rendering has been completed,
the display processor informs the video controller that its
pizza is ready, and when it should deliver it;
the display processor goes on to cook another meal.
When the stated time arrives, the video controller grafts
the image onto its current version of the world, simply \e{ignoring}
any pixels in the new image that are marked as debris.
It is in this way that a wall can be ``grafted'' onto a virtual building 
without having
to bulldoze the whole building and construct it from scratch.

It should be obvious that it is necessary to come up with some sort of
terms
that portray the difference between the frame buffers that
the video controller uses to \Galn ly-propagate the display
from frame to frame, and the frame buffers that the display
processor uses to generate images that will be ``grafted on'' at the
appropriate time.
To this end, we shall simply continue to refer to the
video controller's propagating frame buffers as ``frame buffers''---or,
if a distinction is vital, as ``Galilean frame buffers''.
Buffers that the display processor uses to compose its 
grafted images
will, on the other hand, be referred to as 
\e{\FS\ buffers}. (``Meet you under the clocks at \FS\ Station at
3~o'clock'' being the standard rendezvous arrangement in this city.)
Clearly, for the display processor to be able to run at full speed,
it should have at least two---and preferably more---\FS\ buffers,
so that once it has finished one it can immediately get working
on another, even if the rendezvous time of the first has not yet
arrived.

It is also worthwhile considering the parallel nature of the 
\VR\ system when designing the display controller and the operating
system that drives it.
At any one point time, there will in general be a number of
objects (possibly a very large number) all passing image-generation
information to the display processor for displaying.
Clearly, this cannot occur directly: the display processor would
not know whether it was Arthur or Martha, with conflicting signals
coming from all directions.
Rather, the operating system must handle image-generation requests
in an organised manner.
In general, the operating system will \coord\ the requests of 
various objects, using its intelligence to decide on when the
``next train from \FS'' will be leaving.
Just as with the real \FS\ Station, image generation requests will
be pooled, and definite display rendezvous times scheduled; the
operating system then informs each requesting object of the 
on-the-fly timetable, and each object must compute its control
information \e{as projected to the rendezvous time of the most
suitable scheduled time}.
Critical Warning and Critical Response situations are, however, a little
different, being much like the Melbourne Cup and AFL Grand Final
days: the whole timetable revolves around these events;
other objects may be told that, regrettably, 
there is now no longer any room for them;
they may be forced to re-compute their control points
and board a later train.

These deliberations bring us to the
second of our listed points of consideration for our
new image-generation philosophy: \e{timing and synchronisation}.
The following phrase may be repeated over and over by \VR\ designers
while practising Transcendental Meditation: ``Latency is my enemy.
Latency is my enemy. Latency is my enemy\ldots.''
The human mind is simply not constructed to deal with latency.
Echo a time-delayed version of one's words into one's own ears
and you'll end up in a terrible tongue-tied tangle
(as prominently illustrated by 
the otherwise-eloquent former Prime Minister Bob Hawke's experience
with a faulty satellite link in an interview with a US network).
Move your head around to a visual 
world that lags behind you by half a second
and you'll end up sick as a dog.
Try to smack a virtual wall with your hand, 
and have it smack you back a second later, and you'll probably
feel like you're fighting with an animal, not testing out
architecture.
Latency simply doesn't go down well.

It is for this reason that the above scenario 
(and, indeed, the \Galn\ \anti ing technique itself)
is rooted very firmly in the philosophy of \e{predictive} control.
We are not generating the sterile, static world of traditional
computer graphics: one \e{must} extrapolate in order to be
believable.
If it takes 100 milliseconds to do something, then you should find
a good predictive ``trigger'' for that event that is reasonably
accurate 100 milliseconds into the future.
Optimising such triggers for a particular piece of
hardware may, of course, involve significant research and testing.
But if a suitably reliable trigger for an event \e{cannot} be found
with the hardware at hand, then either get yourself some
better hardware, or else think up 
a less complicated (read: quicker) response; otherwise,
you're pushing the proverbial uphill.
``Latency is my enemy\ldots.''

With this principle in mind, the above description of a
\FS\ buffer involves the inclusion of a \e{rendezvous time}.
This is the precise time (measured in frames periods) 
at which the video controller
is to graft the new image from the \FS\ buffer to the appropriate 
\Galn\ buffer.
As noted above, some adaptive system performance analysis must be
carried out by the operating system for this to work at all---so 
that objects
have a reasonably good idea of just \e{what} they can get done in
the time allocated to them.
Granted this approximate information, image-generation instructions
sent to the display processor should then be such that it \e{can},
in fact, generate the desired image before the rendezvous time.
The time allowed the display processor should be conservative; after
all, it can always start rendering another image, 
into another of its \FS\ buffers, if it finishes the first one
early. But it is clear that being \e{too} conservative is not
wise: objects will unnecessarily underestimate the amount of detail
renderable in the available time; overall performance will suffer.
There must be an experienced balance between these two considerations.

In any practical system, however, Murphy's Law will always hold true:
somewhere, some time, most probably while the boss is inspecting
your magnificent creation, 
the display processor will not finish rendering
an image before the rendezvous time.
It is important that the system be designed to handle this situation
gracefully.
Most disastrous of all would be for the operating system to think
the complete image \e{was} in fact generated successfully: accurate
information about the display's status is crucial in the control
process.
Equally disastrous would be for the display processor to not
pass forward the image at all, or to pass it through ``late'':
the former for the same reason as before; the latter becuase images
of objects would then continue to propagate ``out-of-sync'' with 
their surroundings.

One simple resolution 
of this scenario is for the display processor to \e{finish
what it can} before the rendezvous time; it then relinquishes
control of the \FS\ buffer (with a call of ``Stand clear, please;
stand clear''), and the partial graft is applied by the video 
controller.
The display processor must then \e{pass a high-priority message back
to the operating system that it did not complete in time}, with
the unfulfilled, or partially-fulfilled, instructions simultaneously
passed back in a stack.
The operating system must then process this message with the highest
priority, calling on the objects in question for a Critical Response;
or, if these objects subsequently indicate that the omission 
is not critical,
a regular re-paint operation can be queued at the appropriate
level of priority.
Of course, Incomplete Display Output events should in practice 
be a fairly
rare occurrence, if the adaptive performance analysis system is
functioning correctly; nevertheless, their non-fatal treatment
by the operating system means that performance can be ``tweaked''
closer to the limits than would otherwise be prudent.

There is another side to the question of timing and synchronisation
that we must now address.
In section~\sect{MinimalImplementation}, we assumed that 
the apparent motion of an object is reasonably well described by
a quadratic expression in time.
This is, of course, a reasonable approximation for global-update
systems (in which the inter-update time cannot be left too long 
anyway)---and is, of course, a vast improvement over current
$\Gal{0}$ technology.
However, with the introduction of \e{local} update systems we
must be more careful.
To see why this is the case, consider an object that is
undergoing \e{rotation}.
Now, ignoring that fact that more rotating objects seem to populate
\VR\ demonstrations than exist in the entire visible Universe,
it is clear that such objects pose a potential problem.
This is because a quadratic expression only \e{approximately} describes
the motion of any point in such an object.
As a rule of thumb, once an object rotates by more than $45\degrees$
about a non-trivial axis, this parabolic approximation starts to
break down badly.
It is therefore important that the object in question \e{knows} 
about this
limited life of its \Galn\ image: it is useful for such objects
to label their visual images with a \e{use-by date}.
The operating system should treat use-by date expirations as a \e{high
priority} event: rotating objects can rapidly fly apart if left any
longer, scattering junk all over the display like
an out of control Skylab---junk
that may \e{not} be
fully removed by simply re-rendering the object.
As a \e{bare minimum}, in situations of Critical Warning,
such objects should replace their visual images by a rough
representation of the object, significantly blurred, and with an assigned
velocity and acceleration close to zero.
This will avoid the object from ``self-destructing'' while
the system copes with its critical period;
once the crisis is over, a more accurate (but again less stable)
view can be 
regenerated.
Of course, such intelligent actions must be programmed into
the operating system and the object itself; this is one reason that 
the Critical Warning period was specified above, so that such
evasive actions can be taken while there is still time.

This bring us most naturally to the general question of just \e{how}
each object in a virtual world
can decide how badly out-of-date its image is.
This is a question that must, ultimately, be answered by
extensive research and experience; a simple method will, however,
now be proposed as a starting point.
Each object knows, naturally, what information it sends to the
display processor---typically, this consists of \e{control information}
(such as polygon vertices, velocities, \etc), rather than a pixel-by-pixel
description of the object.
The object also knows just \e{how} the $\Gal{2}$ display system
will propagate these control points forward in time---namely, via
the propagation equations derived in sections~\sect{BasicPhilosophy}
and~\sect{MinimalImplementation} (and, importantly, with finite-accuracy
arithmetic).
On the other hand, the object can also compute where these control
points \e{should} be, based on the exact virtual-world equations
of motion, and also taking
into account the relatives jerk of the object and the participant since
the last update (but subtracting off the
effects of specialised global updates, which shall
be discussed shortly).
By simply taking the ``error'' between the displayed and correct
positions, and making an intelligent estimate, based on the projection
system in used, about just how ``bad'' the 
visual effects of each of these
errors are, an object can fairly simply come up with a relative
estimate of how badly it is in need of an update.
If all of the objects in the virtual world can agree on a protocol for
quantifying this relative need for an update, then the operating
system can use this information intelligently when
prioritising the image update scheduling process.

Another exciting prospect for the algorithmics of image update
priority computation is the technique of 
\e{foveal tracking}, whereby the direction of foveal view is
tracked by a physical transducer.
A \VR\ system might employ this information most fruitfully
by having a \e{foveal input device driver} (like a mouse driver on
a conventional computer) which gathers information about the 
foveal motion and relays it to the operating system to use it 
as it sees fit.
Our foveal view, of course, tends to flick quickly between the several
most ``interesting'' areas of our view, regularly returning to 
previously-visitied objects to get a better look.
By extracting a suitable overview of this ``grazing'' information
and time-stamping it, the foveal device driver can leave it up
to the operating system to decipher the information if it sees fit.
In times of Critical Response, of course, such information will
simply be ignored (or stored for later use) by the operating system;
much more important processes are taking place.
However, at other times, where the participant is ``having a good
look around'', this foveal information may (with a little detective
work) be traced back to the objects that occupied those positions
at the corresponding times; these objects may be boosted in priority
over others for the purposes of an 
improved rendering or texturing process; however, one must
be careful not to play ``texture tag'' with the participant by
relying too exclusively on this (fundamentally historical) information.

We now turn to our third topic of consideration listed above, namely,
ensuring that the display processor can reliably generate \FS\ buffer
images in the first place.
This is \e{not} a trivial property; it requires a careful \coo peration
between the display processor and the image-generation software.
This is because, in traditional global-update environments,
the display processor can be sure that \e{all} visible objects will 
be rendered in any update; $z$-buffering can therefore be employed
to ensure correct hidden surface removal.
The same is \e{not} true, however, in local-update systems: only
\e{some} of the objects may be in the process of being updated.
If there are un-updated objects partially \e{obscuring} the objects
that \e{are} being updated, hidden surface removal must still somehow
be done.

One approach to this problem might be to define one or more 
``clip areas'', that completely
surround the objects requiring updating, and simply render all
objects in these (smaller than full-screen) areas.
This approach is unacceptable on a number of counts.
Firstly, it violates the principles of local updating: we do \e{not}
wish to re-render all objects in the area (even if it is, admittedly,
smaller than full-screen); rather, we only want to re-render the
objects that have requested it.
Secondly, it carries with it the problem of \e{intra-object
mismatch}: if one part of an object is rendered with a low-quality
technique, and another part of the same object 
(which may ``poke into'' some arbitrary clip area)
with a \e{high}-quality
technique, then the strange and unnatural 
``divide'' between the two areas will be a more
intriguing visual feature than the object itself;
the participant's consciousness will start to wander back
towards the world of \RR.

Our approach, therefore, shall be a little subtler.
In generating an image, we will consider two classes of objects.
The first class will be those objects 
that have actually been scheduled for re-rendering.
The second class of objects will 
consist of all of those objects, not of the first class,
which are known to \e{obscure} one or more of the objects of the
first class (or, in practice, \e{may} obscure an object of the
first class, judged by whatever simpler informational subset 
that optimises the
speed of the overall rendering procedure).
Objects of the first class shall be rendered in the normal fashion.
Objects of the \e{second} class, on the other hand, will be
\e{shadow-rendered}: their $z$-buffer information will be stored,
where appropriate, \e{but their corresponding colour information
will flag them as debris}.
``Clear-frame'' debris (the type used to wipe clear the \FS\ buffer in
the first place), on the other hand, will both be marked as debris,
and $z$-depth-encoded to be as far away as possible.
Shadow-rendering is clearly vastly quicker than full rendering:
no shading or texturing is required; no \Galn\ motion information
need be generated; all that is required are the pixels of
obscuration of the object: if the galpixel currently at one such position 
in the \FS\ buffer is currently ``further away'' than the corresponding
point of the second class object, then that pixel is turned to debris
status, and its $z$-buffer value updated to that of the second-class
object; otherwise, it is left alone.
In this way, we can graft complete objects into an image, \e{without} 
affecting any other objects, with much
smaller overheads than are required to re-render the entire 
display---or even, for that matter, an arbitrary selected ``clip
window'' that surrounds the requesting objects.

We now turn to the fourth topic of consideration above: 
namely, the inclusion of
specialised hardware, over and above that needed 
for \Galn\ \anti ing, that can modify the \e{entire} contents of a 
\Galn\ frame buffer to take into account the 
perspectival effects of the motion of the participant
herself (as distinct from the \e{proper motion}---\ie\ with respect to 
the laboratory---of any of the virtual objects).
Such hardware is, in a sense, at the other end of the spectrum
from the local updating just performed: we now wish to be able
to perform \e{global updates} of the galpixmap---but only of
its motional information.
This goal is based on the fact that small changes of acceleration (both
linear and rotational) of
the participant's head will of themselves 
jerk the entire display; and all of the information necessary
to compute these shifts (relying on the $z$-buffer information 
that is required for \Galn\ \anti ing anyway) is already there 
in the \Galn\ frame buffer.
Against this, however, is the fact that the 
mathematical relations describing such transformations are nowhere
near as strikingly simple as those required for \Galn\ \anti ing
itself
(which can, recall, be hard-wired with great ease); rather, we 
would need some sort of maths coprocessor (or a number of them)
to effect these computations.
The problem is that these computations \e{must} be performed
during a \e{single} frame scan, and added to the acceleration of each
galpixel as it is computed (where, recall, standard $\Gal{2}$
\anti ing simply copies across a constant acceleration
for each galpixel in the absence of more information);
whether such a high-speed computational system will be feasible is
questionable.
However, were such systems to become feasible (even if only being
able to account for suitably small jerks, say), the performance of
\VR\ systems employing this technique will be further boosted as 
the display processor is relieved of the need to re-render objects
simply because their overall acceleration is now out of date because
the participant has jerked her head a little.

We now turn to the fifth and penultimate topic of consideration
posed above, that of \e{obscuration caching}.
The principles behind this idea are precisely the same as those
behind the highly successful technique of \e{memory-caching} that is
employed in many processor environments today.
The basic idea is simple: one can make good use of galpixels
that have been recently obscured if they again become unobscured.
The principle situation in which this occurs is where an object
close to the participant ``passes in front of'' a more distance
one, due to parallax.
Without obscuration caching, the closer object ``cuts a swath''
through the background as it goes, which thus requires regular 
display processor updates in order to be regenerated.
On the other hand, if, when two galpixels come to occupy the same
display position, the closer one is displayed, and the farther one is
not discarded, but rather
relegated to a \e{cache \Galn\ frame buffer}, this galpixel can
be ``promoted'' back from the cache to the main display frame buffer
if the current 
position in the main frame buffer becomes unoccupied.
This requires, of course, that the cache have its own ``propagator''
circuitry to propel it from frame to frame, 
in accordance with the principles of
$\Gal{2}$ \anti ing, and thus requires at least two frame
buffers of its own; and it requires an increased complexity in
memory access and comparison procedures between the cache and
main displays; nevertheless, the increased performance that
an obscuration cache would provide may make this a viable
proposition.

Another, simpler form of caching, \e{out-of-view buffering}, may also
be of use in \VR\ systems.
With this approach, one builds \e{more memory} into each frame buffer
than is required to hold the galpixel information for the 
corresponding display device: the extra memory is used for the 
logical galpixels \e{directly surrounding} the displayable area.
An out-of-view buffer may be used in one of two ways.
In \e{cache-only out-of-view
buffering}, the display processor still renders
only to the displayable area of memory; the out-of-view buffer
acts in a similar way to an obscuration cache.
Thus, if the participant rotates her head slightly to the right,
the display galpixels move to the left; those galpixels that
would have otherwise ``fallen off the edge'' of the memory array
are instead shifted to the out-of-view buffer, up to the extent of
this buffer.
If the viewer is actually in the process of \e{accelerating} back to the
\e{left} when this sweep begins, then in a short time these out-of-view
galpixels will again come back into view (as long as they are propagated
along with the rest of the display), and thus magically
``reappear'' by themselves, without having to be
regenerated by the display processor.

On the other hand, in \e{full out-of-view buffering}, the entire
out-of-view buffer is considered to be 
a part of the ``logical display'', of which
the physical display device only displays a smaller subset.
Objects are rendered into the out-of-view buffer just as to any other
part of the display device.
This approach can be useful for \e{preparative buffering}, especially
when specialised head-motion-implementing hardware is present: 
views of those parts of the virtual world 
\e{just outside} the display area may be rendered in advance, 
so that if the
participant happens to quickly move her head in that direction, 
then at least \e{something} (usually a low-quality rendition)
is already there, and the response by the operating system need not
be so critical.
The relative benefits of out-of-view buffering depend to a great 
extent on the specific configuration of the system and the virtual
worlds that it intends to portray; however, 
at least a \e{modest} surrounding
area of out-of-view buffer is prudent on any \VR\ system: as the 
participant rotates her head, this small buffer area can be used
to consecutively load ``scrolling area'' grafts
from the \FS\ buffers
a chunk at a time, so 
that, at least for modest rotation rates, the edge of the display
device proper never goes highly critical.

Finally, we must consider in a fundamental way
the operating system and application
software itself in a \VR\ system, 
if we wish to apply a local-update philosophy at all.
Building the appropriate operating environments, 
and powerful applications to suit, will be an enormously
complicated task---but one that will, ultimately, yield 
spectacular riches.
And herein will lie the flesh and blood of every virtual world,
whether it be large or small; sophisticated or simple;
a simulation of \RR, or a completely fictitious fantasy world.
Regrettably, the author must decline any impulse to speculate further
on the direction that this development will take, and will leave this
task to those many experts that are infinitely more able to do so.
He would, nevertheless, be most interested in visiting any
such virtual worlds that may be offered for his sampling.

And thus, in conclusion, we come to one small request by the 
author---the ultimate goal, it may be argued, of the work presented
in this \typeofdoc:
Could the software masters at ORIGIN please make a \VR\ version
of \e{Wing Commander}?
I can never look out the side windows 
of my ship without fumbling my fingers
and sliding head-first into the Kilrathi\ldots.

\newsect{Ack}{Acknowledgments}
Many helpful discussions with A.\ R.\ Petty and R.\ E.\ Behrend
are gratefully acknowledged.
This \typeofdoc, and the supporting software developed to assist in 
this work, was written on an IBM PS/2 Model 70-386 equipped with an Intel 
i387 maths coprocessor, running MS-DOS 5.0
and Microsoft Windows 3.1, and employing VGA graphics.
The software was developed using the Microsoft C/C++ 7.0 Compiler.
The patient assistance rendered by Microsoft Australia Product Support is 
greatly appreciated.

This work was supported in part by an Australian Postgraduate Research
Allowance, provided by the Australian Commonwealth Government.

IBM and PS/2
are registered trademarks of International Business Machines
Corporation.

386 and 387 are trademarks, and Intel is a registered trademark,
of Intel Corporation.

Wing Commander and ORIGIN are trademarks of ORIGIN Systems, Inc.

Windows is a trademark, and Microsoft and MS-DOS are registered
trademarks, of Microsoft Corporation.

Microsoft Worlds may well be a trademark of Microsoft Corporation
real soon now.

Galileo should not be trademarked by anybody.

Historical phrases used
in this document that have a sexist bias syntactically, but 
which are commonly understood by English-speakers to refer 
unprejudicially to members of either sex, 
have not been considered by the author to be necessary of 
linguistic mutiliation.
This approach in no way reflects the views of the University of Melbourne, 
its office-bearers, or the Australian Government.
The University of Melbourne is an Affirmative Action \e{(sic)} 
and Equal Opportunity employer.
Choice of gender for hypothetical participants in described thought
experiments has been made arbitrarily, and may be changed globally
using a search-and-replace text editor if so desired, without affecting
the intent of the text in any way.

Copyright \copyright~1992 John P.\ Costella.
Material in this work unintentionally encroaching on existing patents,
or patents pending, will be removed on request.
The remaining concepts are donated without reservation 
to the public domain.
The author retains copyright to this document, but
hereby grants permission for its duplication for research 
or development purposes,
under the condition that it is duplicated in its entirety and unmodified
in any way, apart from the abovementioned gender reassignment. 

Queries or suggestions are welcome, and
should be addressed to the author; preferably
via the electronic mail address \verb+jpc@tauon.ph.unimelb.edu.au+;
or, failing this, to the postal address listed at the beginning of
this \typeofdoc.

Printed in Australia on acid-free unbleached recycled virtual paper.
\end{document}

%  End of file 4 of 4. The concatenated document should LaTeX
%                      without any reported errors whatsoever.