From: John Costella Subject: PAPER: Galilean Antialiasing for VR, Part 01/04 Date: Mon, 26 Oct 92 5:17:23 EET [Co-mod (Mark): John Costella (jpc@tauon.ph.unimelb.edu.au) recently sent me this paper to peruse and post, and I have found it very enlightening... Hopefully, you will as well! This paper has been divided into 4 sections. To create the document, save these 4 messages to files, trim off the headers, and then concatenate the files with your favorite command. ( cat file1 file2 file3 file4 > file.tex, or use an editor ) Then, run latex on the resulting file, *twice*, in order to process all of the references correctly. Or, if you don't feel like going through that, just anonymously ftp to ftp.u.washington.edu, cd to directory public/virtual-worlds/ currentPapers, and you'll find the .tex, .dvi, and .ps versions of this paper. Feel free to mail me at deloura@cs.unc.edu if you are unfamiliar with anonymous ftp. I hope you enjoy it! ---Mark ] % File 1 of 4. NOTE: All four files MUST be concatenated % before this document can be LaTeXed. % % % Galilean Antialiasing for Virtual Reality Displays % -------------------------------------------------- % John P. Costella, School of Physics, The University of Melbourne % % Abstract: % -------- % In this paper, a method is described that improves the perceived % "smoothness" of motion depicted on rasterised Virtual Reality % displays, by utilising the powerful information already contained % in the virtual-world engine. Practical implementation of this % method requires a slight realignment of one's view of the nature % of a rasterised display, together with modest modifications to % current image generation and rasterisation hardware. However, the % resulting improvement in the quality of the perceived real-time % image is obtained for only modest computational and hardware cost % --offering the possibility of increasing the *apparent* graphical % capabilities of existing technology by up to an order of magnitude. % % % Copyright (C) 1992 John P. Costella. % % NOTE: The body of this document is written in the LaTeX format. % It can be "read" reasonably with a text editor, but for full % formatting should be LaTeXed, and thence viewed or printed. % % % Bibliographical information: % --------------------------- % Title: "Galilean Antialiasing for Virtual Reality Displays" % Author: John P. Costella % Institution: School of Physics, The University of Melbourne, % Parkville, Victoria 3149, Australia % E-mail: jpc@tauon.ph.unimelb.edu.au % Telephone: +61 3 543-7795 (voice); +61 3 347-4783 (fax) % Length of LaTeXed document: 92 pages % Submitted to: Usenet Sci.virtual-worlds (electronic) % Internal revision number: 1.0 % Date of revision 0.0: 16 October 1992 % Date of this revision: 25 October 1992 % % ------------------------------------------------------------------- % % The following commands are macros that simplify and improve the % consistency of LaTeX papers. If using a text editor, skip to the % next ``horizontal line'' (like the one above) for the body of the % paper proper. % % % Document type. % \documentstyle{article} % % Shorter versions of newcommands. % \newcommand{\nc}{\newcommand} \nc{\rnc}{\renewcommand} % % Type of document. % \nc{\typeofdoc}{paper} \nc{\Typeofdoc}{Paper} % % Equation, section and reference macros. % \nc{\chname}{} \nc{\beqn}[1]{\begin{equation}\label{Eqn:\chname#1}} \nc{\eeqn}{\end{equation}} \nc{\beqnarr}[1]{\begin{eqnarray}\label{Eqn:\chname#1}} \nc{\eeqnarr}{\end{eqnarray}} \nc{\beqnarrn}{\begin{eqnarray}} \nc{\eeqnarrn}{\end{eqnarray}} \nc{\beqnarrnn}{\begin{eqnarray*}} \nc{\eeqnarrnn}{\end{eqnarray*}} \nc{\nline}{\nonumber \\} \nc{\eql}[1]{\label{Eqn:\chname#1}} \nc{\eq}[1]{(\ref{Eqn:\chname#1})} \nc{\fareq}[2]{(x.x)} \nc{\fArEqsUb}[2]{(\ref{Eqn:#1-#2})} \nc{\dummy}{\mbox{}} % \nc{\newchap}[2]{\chapter{#2}\label{Chap:#1}\rnc{\chname}{#1-}} \nc{\chap}[1]{\ref{Chap:#1}} % \nc{\newsect}[2]{\section{#2}\label{Sec:\chname#1}} \nc{\sect}[1]{\ref{Sec:\chname#1}} \nc{\farsect}[2]{x.x} \nc{\fArsEctsUb}[2]{\ref{Sec:#1-#2}} % \nc{\newssect}[2]{\subsection{#2}\label{SubSec:\chname#1}} \nc{\ssect}[1]{\ref{SubSec:\chname#1}} \nc{\farssect}[2]{x.x} \nc{\fArssEctsUb}[2]{\ref{SubSec:#1-#2}} % % Latin and other italicised phrases. % \nc{\e}[1]{\/{\em #1\/}} \nc{\newterm}[1]{\e{#1}} \nc{\ie}{\e{i.e.}} \nc{\eg}{\e{e.g.}} \nc{\viz}{\e{viz.}} \nc{\etal}{\e{et al.}} \nc{\etc}{\e{etc.}} \nc{\apriori}{\e{a priori}} % % Put accents and diereses back in to some words. % \nc{\coord}{co\"{o}rdinate} \nc{\noone}{no\"{o}ne} \nc{\role}{r\^{o}le} \nc{\debacle}{d\'{e}b\^{a}cle} \nc{\naive}{na\"{\i}ve} \nc{\coin}{co\"{\i}n} \nc{\coo}{co\"{o}} \nc{\rei}{re\"{\i}} \nc{\ree}{re\"{e}} \nc{\rea}{re\"{a}} % % Set up some mathematical symbols that either aren't done % or not done too well or are inconvenient with native LaTeX. % \nc{\tb}{&\!\!\!\!\dummy} \nc{\tbnd}{&\!\!\!\!} \nc{\paren}[1]{\left(#1\right)} \nc{\leftparen}[1]{\left(#1\right.} \nc{\rightparen}[1]{\left.#1\right)} \nc{\brac}[1]{\left[#1\right]} \nc{\leftbrac}[1]{\left[#1\right.} \nc{\rightbrac}[1]{\left.#1\right]} \nc{\braces}[1]{\left\{#1\right\}} \nc{\leftbrace}[1]{\left\{#1\right.} \nc{\rightbrace}[1]{\left.#1\right\}} \nc{\modsign}[1]{\left|#1\right|} \nc{\rightmod}[1]{\left.#1\right|} \nc{\leftmod}[1]{\left|#1\right.} % \nc{\txt}[1]{{\rm#1}} \nc{\vdot}{\!\cdot\!} \nc{\br}[1]{\overline{#1}} \nc{\dotpr}[2]{(#1\vdot#2)} \nc{\littlefrac}[2]{{\scriptstyle\frac{#1}{#2}}} \nc{\f}[2]{{\displaystyle\frac{#1}{#2}}} \nc{\vect}[1]{\mbox{\boldmath{$#1$}}} \nc{\vcapdot}[1]{\dot{\vect{#1}\,}\!} \nc{\vcapddot}[1]{\ddot{\vect{#1}\,}\!} \nc{\gcapdot}[1]{\dot{#1\,}\!} \nc{\gcapddot}[1]{\ddot{#1\,}\!} \nc{\gcap}[1]{{\it#1}} % \nc{\ten}[1]{10^{#1}} \nc{\byten}[1]{\times10^{#1}} \nc{\degrees}{^{\circ}} \nc{\cross}{\!\times\!} \nc{\half}{\frac{1}{2}} \nc{\quarter}{\frac{1}{4}} \nc{\parenfracpower}[3]{\paren{\f{#1}{#2}}^{\!\!#3}} \nc{\pard}{\partial} \nc{\scr}[1]{{\cal#1}} \nc{\id}{\equiv} \nc{\dual}[1]{\widetilde{#1}} \nc{\what}[1]{\widehat{#1}} \nc{\del}{\nabla} \nc{\dash}{\prime} \nc{\dAlem}{\Box^2} \nc{\artan}{\txt{artan}} \nc{\arsin}{\txt{arsin}} % % Some short-cuts for greek letters. Capitals are also italicised. % \nc{\al}{\alpha} \nc{\be}{\beta} \nc{\g}{\gamma} \nc{\de}{\delta} \nc{\eps}{\varepsilon} \nc{\epsil}{\epsilon} \nc{\z}{\zeta} \nc{\et}{\eta} \nc{\th}{\theta} \nc{\varth}{\vartheta} \nc{\io}{\iota} \nc{\k}{\kappa} \nc{\la}{\lambda} \nc{\m}{\mu} \nc{\n}{\nu} \nc{\x}{\xi} \nc{\p}{\pi} \nc{\varp}{\varpi} \nc{\r}{\rho} \nc{\varr}{\varrho} \nc{\s}{\sigma} \nc{\vars}{\varsigma} \nc{\ta}{\tau} \nc{\ups}{\upsilon} \nc{\ph}{\phi} \nc{\varph}{\varphi} \nc{\ch}{\chi} \nc{\ps}{\psi} \nc{\om}{\omega} % \nc{\G}{\gcap{\Gamma}} \nc{\D}{\gcap{\Delta}} \nc{\Th}{\gcap{\Theta}} \nc{\La}{\gcap{\Lambda}} \nc{\X}{\gcap{\Xi}} \nc{\Py}{\gcap{\Pi}} \nc{\Si}{\gcap{\Sigma}} \nc{\Ups}{\gcap{\Upsilon}} \nc{\Ph}{\gcap{\Phi}} \nc{\Ps}{\gcap{\Psi}} \nc{\Om}{\gcap{\Omega}} % % Reference list commands. % \nc{\bib}[1]{\bibitem{#1}} % \nc{\paper}[5]{#1, {\it #2\/}\ {\bf #3} (#4) #5} \nc{\book}[7]{#1, {\it #2\/}#3\ (#4, #5, #6)#7} \nc{\booknoyr}[6]{#1, {\it #2\/}#3\ (#4, #5)#6} \nc{\booknocity}[6]{#1, {\it #2\/}#3\ (#4, #5)#6} % % Paper-specific macros. % \nc{\Galn}{Galilean} \nc{\Gal}[1]{G^{(#1)}} \nc{\Anti}{Antialias} \nc{\anti}{antialias} \nc{\VR}{Virtual Reality} \nc{\RR}{Real Reality} \nc{\FS}{Flinders Street} \nc{\vx}{\vect{x}} \nc{\vv}{\vect{v}} \nc{\vi}{\vect{i}} \nc{\vj}{\vect{j}} \nc{\va}{\vect{a}} \nc{\vb}{\vect{b}} \nc{\vn}{\vect{n}} \nc{\floor}{\mbox{floor}} % % -------------------------------------------------------------------- % % ----> The LaTeX-formatted paper begins here... <---- % \begin{document} % \title{\Galn\ \Anti ing for \VR\ Displays} \author{John P.\ Costella \\ {\small\it School of Physics, The University of Melbourne, Parkville, Vic.\ 3052, Australia}} \date{25 October 1992} \maketitle % \begin{abstract} In this \typeofdoc, a method is described that improves the perceived ``smoothness'' of motion depicted on rasterised \VR\ displays, by utilising the powerful information already contained in the virtual-world engine. Practical implementation of this method requires a slight \rea lignment of one's view of the nature of a rasterised display, together with modest modifications to current image generation and rasterisation hardware. However, the resulting improvement in the quality of the perceived real-time image is obtained for only modest computational and hardware cost---offering the possibility of increasing the \e{apparent} graphical capabilities of existing technology by up to an order of magnitude. \end{abstract} % \newsect{Intro}{Introduction} Numerous definitions of the term \e{\VR} abound these days. However, they all share one common thread: a successful \VR\ system \e{convinces} you that you are somewhere other than where you really are. Despite engineers' wishes to the contrary, just \e{how} convincing this experience is seems to depend only weakly on raw technical statistics like megabits per second; rather, more important is how well this information is ``matched'' to the expectations of our physical senses. While our uses for \VR\ might encompass virtual worlds bearing little resemblance to the real world, our physical senses nevertheless still expect to be stimulated in the same ``natural'' ways that they have for millions of years. Two particular areas of concern for designers of current \VR\ systems are the \e{latency} and \e{update rate} of the visual display hardware employed. Experience has shown that poor performance in either of these two areas can quickly destroy the ``realness'' of a \VR\ session, even if all of the remaining technical specifications of the hardware are impeccable. This is perhaps not completely surprising, given that the images of objects that humans interact with in the natural world are never delayed by more than fractions of milliseconds, nor are they ever ``sliced'' into discrete time frames (excluding the intervention of the technology of the past hundred years). On the other hand, the relative psychological importance of latency and update rate can, conversely, be used to great advantage: with suitably good performance on these two fronts, the ``convinceability factor'' of a \VR\ system can withstand relatively harsh degradation of its other capabilities (such as image resolution). ``If it \e{moves} like a man-eating beast, it probably \e{is} a man-eating beast---even if I didn't see it clearly'' is no doubt a hard-wired feature of our internal image processing subsystem that is responsible for us still being on the planet today. A major problem facing the \VR\ system designer, however, is that presenting a sufficiently ``smooth'' display to fully convince the viewer of ``natural motion'' often seems to require an unjustifiably high computational cost. As an order-of-magnitude estimate, a rasterised display update rate of around 100 updates per second is generally sufficiently fast enough that the human visual system cannot distinguish it from continuous vision. But the actual amount of information gleaned from these images by the viewer in one second is nowhere near the amount of information containable in 100 static images---as can be simply verified by watching a ``video montage'' of still photographs presented rapidly in succession. The true rate of ``absorption'' of detailed visual information is probably closer to 10 updates per second---or worse, depending on how much actual detail is taken as a benchmark. Thus, providing a completely ``smooth'' display takes roughly an order of magnitude more effort than is ultimately appreciated---not unlike preparing a magnificent dinner for twelve and then having \noone\ else turn up. For this reason, many \VR\ designers make an educated compromise between motion smoothness and image sophistication, by choosing an update rate that is somewhere between the $\sim\!10$ updates per second rate that we absorb information at, and the $\sim\!100$ updates per second rate needed for smooth apparent motion. Choosing a rate closer to 10 updates per second requires that the participant mentally ``interpolate'' between the images presented---not difficult, but nevertheless requiring some conscious processing, which seems to leave fewer ``brain cycles'' for appreciating the virtual-world experience. On the other hand, choosing a rate closer to 100 updates per second results in natural-looking motion, but the reduced time available for each update reduces the sophistication of the graphics---fewer ``polygons per update''. The best compromise between these two extremes depends on the application in question, the expectations of the participants, and, probably most importantly, the opinions of the designer. In the remaining sections of this \typeofdoc, we outline enhancements to current rasterised display technology that permit the motion of objects to be consistently displayed at a high update rate, while allowing the image generation subsystem to run at a lower update rate, and hence provide more sophisticated images. Section~\sect{BasicPhilosophy} presents an overview of the issues that are to be addressed, and outlines the general reasoning behind the approach that is taken. Following this, in section~\sect{MinimalImplementation}, detailed (but platform-independent) information is provided that would allow a \VR\ designer to ``retrofit'' the techniques outlined in section~\sect{BasicPhilosophy} to an existing system. For these purposes, as much of the current image generation design philosophy as possible is retained, and only those minimal changes required to implement the techniques immediately are described. However, it will be shown that the full benefit of the methods described in this \typeofdoc, in terms of the specific needs of \VR, will most fruitfully be obtained by subtly changing the way in which the image generation process is currently structured. These changes, and more advanced topics not addressed in section~\sect{MinimalImplementation}, are considered in section~\sect{Enhancements}. \newsect{BasicPhilosophy}{The Basic Philosophy} We begin, in section~\ssect{CurrentRasters}, by reviewing the general methods by which current rasterised displays are implemented, to appreciate more fully why the problems outlined in section~\sect{Intro} are present in the first place, and to standardise the terminology that will be used in later sections. Following this, in section~\ssect{Motion}, we review some of the fundamental physical principles underlying our understanding of motion, to yield further insight into the problems of depicting it accurately on display devices. These deliberations are used in section~\ssect{GalAnti} to pinpoint the shortcomings of current rasterised display technology, and to formulate a general plan of attack to rectify these problems. A brief introduction to the terminology used for the general structures required to carry out these techniques is given in section~\ssect{Galpixels}---followed, in section~\ssect{GalpixmapStructure}, by a careful consideration of the level of sophistication required for practical yet reliable systems. Specific details about the hardware and software modifications required to implement the methods outlined are deferred to sections~\sect{MinimalImplementation} and~\sect{Enhancements}. \newssect{CurrentRasters}{Overview of Current Rasterised Displays} Rasterised display devices for computer applications, while ubiquitous in recent years, are relatively new devices. Replacing the former \e{vector display} technology, they took advantage of the increasingly powerful memory devices that became available in the early 1970s, to represent the display as a digital matrix of pixels, a \e{raster} or \e{frame buffer}, which was scanned out to the CRT line-by-line in the same fashion as the by then well-proven technology of \e{television}. The electronic circuitry responsible for scanning the image out from the frame buffer to the CRT (or, these days, to whatever display device is being used) is referred to as the \e{video controller}---which may be as simple as a few interconnected electronic devices, or as complex as a sophisticated microprocessor. The \e{refresh rate} may be defined as the reciprocal of the time required for the video controller to refresh the display from the frame buffer, and is typically in the range 25--120~Hz, to both avoid visible flicker, and to allow smooth motion to be depicted. (Interlacing is required for CRTs at the lower end of this range to avoid visible flicker; for simplicity, we assume that the display is \e{non-interlaced} with a suitably high refresh rate.) Each complete image, as copied by the video controller from the frame buffer to the physical display device, is referred to as a \e{frame}; this term can also be used to refer to the time interval between frame refreshes, \ie\ the reciprocal of the refresh rate. Most \VR\ systems employ two display devices to present a stereoscopic view to the participant, one display for each eye. Physically, this may be implemented as two completely separate display subsystems, with their own physical output devices (such as a twin-LCD head-mounted display). Alternatively, hardware costs may be reduced by interleaving the two video signals into a single physical output device, and relying on physically simpler (and sometimes less ``face-sucking'') demultiplexer techniques to separate the signals, such as time-domain switching (\eg\ electronic shutter glasses), colour-encoding (\eg\ the ``3-D'' coloured glasses of the 1950s), optical polarisation (for which we are fortunate that the photon is a vector boson, and that we have only two eyes), or indeed any other physical attribute capable of distinguishing two multiplexed optical signals. For the purposes of this \typeofdoc, however, we define a \e{logical display device} to be \e{one} of the video channels in a twin-display device (say, the left one), or else the corresponding \e{effective} monoscopic display device for a multiplexed system. For example, a system employing a single (physical) 120-Hz refresh-rate CRT, time-domain-multiplexing the left and right video channels into alternate frames, is considered to possess two \e{logical} display devices, each running at a 60~Hz refresh rate. In general, we ignore completely the engineering problems (in particular, \e{cross-talk}) that multiplexed systems must contend with. Indeed, for most of this \typeofdoc, we shall ignore the stereoscopic nature of \VR\ displays altogether, and treat each video channel separately; therefore, all references to the term ``display device'' in the following sections refer to \e{one} of the two logical display devices, with the understanding that the other video channel simply requires duplicating the hardware and software of the first. We have described above how the video controller scans frames out from the frame buffer to the physical display device. The frame buffer, in turn, receives its information from the \e{display processor} (or, in simple systems, the CPU itself---which we shall also refer to as ``the display processor'' when acting in this \role). Data for each pixel in the frame buffer is retained unchanged from one frame to the next, unless it is overwritten by the display processor in the intervening time. There are many applications for computer graphics for which this ``sample-and-hold'' nature of the frame buffer is very useful: a background scene can be painted into the frame buffer once, and only those objects that change their position or shape from frame to frame need be redrawn (together with repairs to the background area thus uncovered). This technique is often well-suited to traditional computer hardware environments---namely, those in which the display device is physically fixed in position on a desk or display stand---because a constant background view accords well with the view that we are using the display as a (static) ``window'' on a virtual world. However, this technique is, in general, ill-suited to \VR\ environments, in which the display is either affixed to, or at least in some way ``tracks'', the viewer, and thus must change even the ``background'' information constantly as the viewer moves her head. There are several problems raised by this requirement of constant updating of the entire frame buffer. Firstly, if the display processor proceeds to write into the frame buffer at the same time as the video controller is refreshing the display, the result will often be an excessive amount of visible flicker, as partially-drawn (and, indeed, partially-erased) objects are ``caught with their pants down'' during the refresh. Secondly, once the proportion of the image requiring regular updating rises to any significant fraction, it becomes more computationally cost-effective to simply erase and redraw the entire display than to erase the individual objects that need redrawing. Unless the new scene can be redrawn in significantly less time than a single frame (untrue in any but the most trivial situations), the viewer will see not a succession of complete views of a scene, but rather a succession of scene-building drawing operations. This is often acceptable for ``non-immersive'' applications such as CAD (in which this ``building process'' can indeed often be most informative); it is not acceptable, however, for any convincing ``immersive'' application such as \VR. The standard solution to this problem is \e{double buffering}. The video controller reads its display information from one frame buffer, whilst at the same time a second frame buffer is written to by the display processor. When the display processor has finished rendering a complete image, the video controller is instructed to switch to the second frame buffer (containing the new image), simultaneously switching the display processor's focus to the first frame buffer (containing the now obsolete image that the video controller was formerly displaying). With this technique, the viewer sees one constant image for a number of frames, until the new image has been completed. At that time, the view is instantaneously switched to the new image, which remains in view until yet another complete image is available. Each new, completed image is referred to as an \e{update}, and the rate at which these updates are forthcoming from the display processor is the \e{update rate}. It is important to note the difference between \e{refresh} rate and \e{update} rate, and the often-subtle physical interplay between the two. The refresh rate is the rate at which the video controller reads images from the frame buffer to the display device, and is typically a constant for a given hardware configuration (\eg\ 70 Hz). The update rate, on the other hand, is the rate at which complete new images of the scene in question are rendered; it is generally lower than the refresh rate, and usually depends to a greater or lesser extent on the complexity of the image being generated by the display processor. It is often preferable to change the video processor's focus only \e{between} frame refreshes to the physical display device---and not mid-frame---especially if the display processor is comparable in speed to one update per frame. This is because switching the video controller's focus mid-frame ``chops'' those objects in the particular scan line that is being scanned out by the video controller at the time, which leads to visible discontinuities in the perceived image. On the other hand, this restriction means that the display processor must wait until the end of a frame to begin drawing a new image (unless a third frame buffer is employed), effectively forcing the refresh-to-update rate ratio up to the next highest integral value. This is most detrimental when the update rate is already high; for example, if an update takes only 1.2 refresh periods to be drawn, ``synchronisation'' with the frame buffer means that the remaining 0.8 of a refresh period is unusable for drawing operations. Regardless of whether the frame-switching circuitry is synchronised to the frame rate or not, if the update rate of the display processor is in fact slower than the refresh rate (the usual case), then the same static image persists on the display device for a number of frames, until a new image is ready for display. For \e{static} objects on the display, this ``sample-and-hold'' technique is ideal: the image's motion (\ie\ no motion at all!) is correctly depicted at the (high) refresh rate, even though the image itself is only being generated at the (lower) update rate. This phenomenon, while appearing quite trivial in today's rasterised-display world, is in fact a major advance over the earlier vector-display technology: the video processor, utilising the frame buffer, effectively \e{fills in the information gaps} between the images supplied by the display processor. Recognition of the remarkable power afforded by this feat of ``interpolation''---and, more importantly, a critical assessement of how this ``interpolation'' is currently carried out---is critical to appreciating the modifications that will be suggested shortly. As mentioned in section~\sect{Intro}, the \e{latency} (or ``time lag'') of a \VR\ system in general, and the display system in particular, is crucial for the experience to be convincing (and, indeed, non-nauseous). There are many potential and actual sources of latency in such systems; in this \typeofdoc, we are concerned only with those introduced by the image generation and display procedures themselves. Already, the above description of a double-buffered display system contains a number of potential sources of lag. Firstly, if the display processor computes the apparent positions of the objects in the image based on positional information valid at the \e{start} of its computations, these apparent positions will already be slightly out of date by the time the computations are complete. Secondly, the rendering and scan-conversion of the objects takes more time, and is based on the (already slightly outdated) positional information. Finally---and perhaps most subtly---the very ``sample-and-hold'' nature of the video processor's frame buffer leads to a significant average time lag itself, equal to \e{half the update period}. While a general mathematical proof of this figure is not difficult, a ``hand-waving'' argument is easily constructed. For simplicity, assume that all other lags in the graphical pipeline are magically removed, so that, upon the first refresh of a new update, it describes the virtual environment at that point in time accurately. By the time of the second refresh of the same image, it is now one frame out-of-date; by the third refresh, it is two frames out-of-date; and likewise for all remaining refreshes of the same image until a new update is provided. By the ``hand-waving'' argument of simply averaging the out-of-datedness of each refresh across the entire update period, one obtains \[ \left<\ta_\txt{\,lag}\right>\sim\f{1}{\ta_\txt{update}} \int_0^{\ta_\txt{update}}t\,dt =\half\ta_\txt{update}, \] where $\ta_\txt{update}$ is the update period. Thus, a long update period not only affects the \e{smoothness} of the perceived display, but also its \e{latency}---thus rendering it a particularly insidious enemy of real-time \VR\ systems, and a doubly worthy target of our attention. It is this undesirable feature of conventional display methodology that we will aim to remove in this \typeofdoc. However, to provide suitable background for the approach we shall take, and to put our later specifications into context, we first review some quite general considerations on the nature of physical motion. \newssect{Motion}{The Physics of Motion} As noted in section~\sect{Intro}, while our applications for \VR\ technology may encompass virtual worlds far removed from the laws of physics, our physical senses nevertheless expect to be stimulated more or less in the same way that they are in the real world. It is therefore useful to review briefly the evolution of man's knowledge about the fundamental nature of motion, and note how well these views have or have not been incorporated into real-time computer graphics. Some of the earliest questions about the nature of motion that have survived to this day are due to Zeno of Elea. His most \e{famous} paradox---that of Achilles and Tortoise---is amusing to this day, but is nevertheless more a question of mathematics than physics. More interesting is his paradox of the Moving Arrow: At any instant in time, an arrow occupies a certain position. At the next instant of time, the arrow has moved forward somewhat. His question, somewhat paraphrased, was: How does the arrow know how to get to this new position by the very next instant? It cannot be ``moving'' at the first instant in time, because an instant has no duration---and motion cannot be measured except over some duration. Let us leave aside, for the moment, the flaws that can be so quickly pointed out in this argument by anyone versed in modern physics. Consider, instead, what Zeno would say to us if we travelled back in time in our Acme Time Travel Machine, and showed him a television receiver displaying a broadcast of an archery tournament. (Ignore the fact that, had television programmes been in existence two and a half thousand years ago, Science as we know it would probably not exist.) Zeno would no doubt be fascinated to find that the arrows that moved so realistically across the screen were, in fact, a \e{series of static images} provided in rapid succession---in full agreement (or so he would think) with his ideas on the nature of motion. The question that would then spring immediately to his lips: \e{How does the television know how to move the objects on the screen?} Our response would, no doubt, be that the television \e{doesn't} know how to move the objects; it simply waits for the next frame (from the broadcasting station) which shows the objects in their new positions. Zeno's follow-up: How does the \e{broadcasting station} know how to move them? Answer: It doesn't either; it just sends whatever images the video camera measures. And eventually we return to Zeno's original question: How does the real arrow itself ``know'' how to move? Ah, well, that's a question that even television cannot answer. Ignoring for the moment the somewhat ridiculous nature of this hypothetical exchange, consider Virtual Zeno's first question from first principles. Why \e{can't} the television move the objects by itself? Surely, if the real arrow somehow knows how to move, then it is not unreasonable that the television might obtain this knowledge too. The only task then, is to determine this information, and tell it to the television! Of course, this is a little simplistic, but let us fast-forward our time machine a little and see what answers we obtain. Our next visit would most likely be to Aristotle. Asking him about Zeno's arrow paradox would yield his well-known answer---that would, in fact, be regarded as the ``right answer'' for the next 2300 years: Zeno wrongly assumes that indivisible ``instants of time'' exist at all. Granting Aristotle this explanation of Zeno's mistake, what would his opinions be regarding ``teaching'' the television how to move the objects on its own? His response, no doubt, would be to explain that every object has its \e{natural place}, and that its \e{natural motion} is such that it moves towards its natural place, thereafter remaining at rest (unless subsequently subjected to \e{violent motions}). Heartened by this news, we ask him for a mathematical formula for this natural motion, so that we can teach it to our television. ``Ah, well, I don't think too much of mathematical formul\ae,'' he professes, engrossed in a re-run of \e{I Love Lucy}, ``although I can tell you that heavier bodies fall faster than light ones.'' So much for an Aristotelian solution to our problem. Undaunted, we tweak our time machine forward somewhat---2000 years, in fact. Here, we find the ageing Galileo Galilei ready and willing to answer our questions. On asking about Zeno's arrow paradox, we find a general agreement with Aristotle's explanation of Zeno's error. On the other hand, on enquiring how a television might be taught how to move objects on its own, we obtain these simple answers: If the body is in \e{uniform motion}, it moves according to $x=x_0+vt$; if it is \e{uniformly accelerated}, it moves according to $x=x_0+v_0t+ \half at^2$. Furthermore, \e{gravity} acts as a uniform acceleration---changing the velocity of an object smoothly (and not discontinuously, as earlier propounded); and, what is more, this rate of acceleration is a constant for every object. To this, one must add accelerations other than gravity (such as the force of someone in throwing a ball) into the equation. If we teach these principles to our television, he explains, it \e{will} then know how to move objects by itself. And thus, from the first modern physicist, we get the information we desire---meanwhile leaving him fascinated by images of small white projectiles following parabolic paths, subtitled ``British Open Highlights''. The tale woven in this section is, admittedly, a little fanciful, but nevertheless illustrates most clearly the thinking behind the methods to be expounded. Very intriguing, but omitted from this account, is the fact that Aristotle's solution to Zeno's arrow paradox, which remained essentially unchanged throughout the era of Galilean relativity and Newtonian mechanics, suffered a mortal blow three-quarters of a century ago. We now know that, ultimately, the ``smooth'' nature of space-time recognised by Galileo and Newton, and which underwent a relative benign ``warping'' in Einstein's classical General Relativity, must somehow be fundamentally composed of quantum mechanical ``gravitons''; unfortunately, \noone\ knows exactly how. Zeno's very question, ``How does anything move at all?'', is again \e{the} unsolved problem of physics. But that is a story for another place. Let us therefore return to the task at hand, and utilise the method we have gleaned from seventeenth century Florence. \newssect{GalAnti}{\Galn\ \Anti ing} Consider the rasterised display methology reviewed in section~\ssect{CurrentRasters}. How does its design philosophy fit in with the above historical figures' views on motion? It is apparent that the ``slicing up'' in time of the images presented on the display device, considered simplistically, only fits in well with Zeno's ideas on motion. However, we have neglected the human side of the equation: clearly, if frames are presented at a rate exceeding the viewer's visual system's temporal resolution, then the effective integration performed by the viewer's brain combines with the ``time-sampled'' images to reproduce continuous motion---that is, at least for motion that is slow enough for us to follow visually. Consider now the ``interpolation'' procedure used by the video processor and frame buffer. Is this an optimal way to proceed? Aristotle would probably have said ``no''---the objects, in the intervening time between updates, should seek their ``natural places''. Galileo, on the other hand, would have quantified this criticism: the objects depicted should move with either constant velocity if free, or constant acceleration if they are falling; if subject to ``violent motion'', this would also have to programmed. Instead, the sample-and-hold philosophy of section~\ssect{CurrentRasters} keeps each object at one certain place on the display for a given amount of time, and then makes it \e{spontaneously jump} by a certain distance; and so on. In a sense, the pixmap \e{has no inertial properties}. As noted, this \e{is} the ideal behaviour for an object that is not moving at all; its manifest incorrectness for a moving object is even more simply revealed by simple \Galn\ mechanics: Consider how an object travelling at constant apparent velocity $\vv$, with respect to the display, is depicted with this system. Mathematically, the trajectory displayed by the video processor is \beqn{SampleAndHoldConstV} \vx(t)=\vx(0)+\vv\,\floor(t), \eeqn where $\vx(0)$ is the two-dimensional pixel-position vector at $t=0$, $t$ is measured in units of frame-periods, $\vv$ is the (constant) velocity of the real object being simulated (in units of pixels per frame period), and $\floor(y)$ returns the greatest integer that is smaller than or equal to $y$. Now consider applying a \Galn\ transformation of velocity $\vv$ to the \e{viewer} of the system, in the same direction that the object is moving (\eg\ by having the viewer standing on a ``moving footpath'' purloined from LA International Airport, which travels past the display device). The new trajectory seen by this moving viewer, $\vx'(t)$, is obtained from the stationary-viewer trajectory $\vx(t)$ by the Galilean transformation \beqn{GalXfn} \vx'(t)=\vx(t)-\vv t. \eeqn The \e{correct} trajectory of the object, of course, should simply be \[ \vx'(t)=\vx'(0)\id\vx(0), \] \ie\ a stationary object. On the other hand, application of the transformation \eq{GalXfn} to the video-controller trajectory \eq{SampleAndHoldConstV} yields \beqn{SpuriousConstV} \vx'(t)=\vx'(0)+\vv\braces{\floor(t)-t}. \eeqn The function $f(t)\id\floor(t)-t$ appearing here can be recognised as simply a ``saw-tooth'' function, ramping linearly from $f(0)=0$ to $f(1^-)=1^-$, at which instant it jumps back to $f(1^+)=0^+$; it then ramps linearly back up to $f(2^-)=1^-$, and jumps back down to $f(2^+)=0^+$; and so on. \e{It is this spurious motion, and this motion alone, that causes the sample-and-hold display philosophy to perform poorly for uniformly moving objects.} The basic idea of a frame buffer is not the problem: rather, the fault lies with the \naive\ way in which it is used. It is also seen why a longer update period worsens the effect: the object ``wanders'' further---and for a longer time---before ``jumping'' back to its correct position. It is not surprising that such an effect is nauseous; the amount of inebriation required to simulate this effect in \RR\ is more than enough to separate the participant from his or her last meal. This spurious motion can also be viewed in another light. If one draws a \e{space-time diagram} of the trajectory of the object as depicted by the sample-and-hold video display, one obtains a staircase-shaped path. The \e{correct} path in space-time is, of course, a straight line. The saw-tooth error function derived above is the difference between these two trajectories; the ``jumping'' is the exact spatio-temporal analogue of \e{the jaggies}---the (spatial) ``staircase'' effect observable when straight lines are rendered in the simplest way on bitmapped (or rectangular-grid-sampled) displays. The mathematical description of this general problem with sampled signals is \e{aliasing}; in rough terms, high-frequency components of the original image ``masquerade as'', or \e{alias}, low-frequency components when ``sampled'' by the bitmapping procedure, rendering the displayed image a subtly distorted and misleading version of the original. As is well-known, however, aliasing \e{can} be avoided in a sampled signal, by effectively filtering out the high-frequency components of the original signal before they get aliased by the sampling procedure. This technique, applied to any general sampled signal, is termed \e{\anti ing}; in the field of computer graphics, reference is often made to \e{spatial \anti ing} techniques used to remove ``the jaggies'' from scan-converted images. (This is often shortened, in that field, to the unqualified term ``\anti ing''; we shall reject this trend and \rei nstate the adjective ``spatial''.) For the same reasons, the ``jerky motion'' of sample-and-hold video controllers is thus most accurately referred to as \e{spatio-temporal aliasing}; any method seeking to remove or reduce it is \e{spatio-temporal \anti ing}. One form of spatio-temporal antialising is performed every time we view standard television images. Generally, television cameras have an appreciable \e{shutter time}: any motion of an object in view during the time the (electronic) shutter is ``open'' results in \e{motion blur}. That such blur is in fact a \e{good} thing---and not a shortcoming---may be surprising to those unfamiliar with sampling theory. However, the fact that the human eye easily detects the weird effects of spatio-temporal aliasing if motion blur is \e{not} present, even at the relatively high field rate of 50~Hz (or 60~Hz in the US), can be appreciated by viewing any footage from a modern sporting event, such as the Barcelona Olympics. To improve the quality of the now-ubiquitous slow-motion replay (for which motion blur is stretched to an unnatural-looking extent), such events are usually shot with cameras equipped with \e{high-speed} electronic shutters, \ie\ electronic shutters that are only ``open'' for a small fraction of the time between frames. The resulting images, played at their natural rate of 50 fields per second, have a surreal, ``jerky'' look (often called the ``fast-forward effect'' because the fast picture-search methods of conventional video recorders lead to the same unnatural repression of motion blur). This effect is, of course, simply spatio-temporal aliasing; that it is noticeable to the human eye at 50 fields per second (albeit only 25 \e{frames} per second) illustrates our visual sensitivity. (For computer-generated displays, for which simulating motion blur may be relatively computationally expensive, increasing the refresh and update rates to above 100 Hz and relying on integration by the CRT phosphor or LCD pixel, and our visual system, may be the simplest solution.) This \e{frame}-rate spatio-temporal aliasing, which is relatively easy to deal with, is not usually a severe problem. Our immediate concern, on the other hand, is a much more pronounced phenomenon: the \e{update}-rate spatio-temporal aliasing produced by the sample-and-hold nature of conventional video controllers (the spurious motion described by \eq{SpuriousConstV}). Correcting the video controller's procedures to remove this spurious motion is thus our major task. Again, we recall Zeno's question: how does the arrow know where to move, if it only knows where it is, not where it's going? The answer, supplied first by Galileo (albeit in a somewhat long-winded form, in pre-calculus days), is that we need to know the \e{instantaneous time derivative of the position} (\ie\ instantaneous velocity) of the object at that particular time, in addition to its position. We shall refer to the use of such information (or, in general, any arbitrary number of temporal derivatives of an object's motion) to perform update-rate spatio-temporal antialiasing as \e{\Galn\ \anti ing}. Suggested methods for carrying out this procedure with existing technology are described in the remainder of this \typeofdoc. To carry out this task, we need to \ree xamine the video controller philosophy described in section~\ssect{CurrentRasters}. The most obvious observation that strikes one is that, using that design methodology, \e{velocity information is not provided to the video controller at all!} The reason for this omission is easily understood in historical perspective. \e{Television} applications for CRTs preceded computer graphics applications by decades. At least initially, all television images were generated by simply transmitting the signal from a video camera, or one of a number of available cameras. However, normal video cameras have no facilities for determining the \e{velocity} of the objects they view. (Although this is not, in principle, impossible, it would be technically challenging, and quite possibly of no practical use.) Rather, the high frame and field rate of a television picture alone, together with suitable motion blur, were sufficient to convince the viewer of the television image that they were seeing continuous, smooth motion. When CRTs were first used for computer applications, in vector displays, the voltages applied to the deflection magnets were directly controlled by the video harware; such displays' only relation to television displays was that they both used CRT technology. However, when simple \e{rasterised} computer displays became feasible in the early 1970s, it was only natural that their development was built on the vast experience gathered from television technology---which, as noted, has no notion of storing velocity information. In fact, it is only in very recent years that memory technology has been sufficiently advanced that the \e{physics of the display devices}---rather than the amount of amount of video memory feasible---is now the limiting factor in developing ever more sophisticated displays at a reasonable price. To even contemplate storing the velocity information of a frame---even if it \e{were} possible to determine such information---is something that would have been unthinkable ten years ago. It is, of course, no \coin cidence that the field of \VR\ has also just recently become cost-effective: the immature state of processor and memory technology was the critical factor that limited Sutherland's pioneering efforts twenty-five years ago. It is thus no surprise that the fledgling commercial field of \VR\ requires new approaches to traditional problems. Of course, the very nature of \VR, while putting us in the position of requiring rapid updates to the entire display, conversely provides us with the \e{very} information about displayed objects we need: namely, velocities, accelerations, and so on, rather than just the simple \e{positional} information that a television camera provides. Now, it is of course a trivial observation that all virtual-world engines already ``know'' about the laws of Galilean mechanics, or Einsteinian mechanics, or nuclear physics---or indeed any system of mechanics that we wish to program into them, either based on the real universe, or of a completely fictional nature. In that context, our rehash of the notions behind Galilean mechanics may seem trivial and unworthy of the effort spent. What existing virtual-world engines do \e{not} do, however, is \e{share some of this information with the video controller}. On this front, apparent triviality is magnified to enormous importance; our neglect of these same physical laws is, in fact, creating an artifical and unnecessary degradation of performance in many existing \VR\ hardware methodologies. There is no reason for this omission to continue; the physics has been around for over three hundred and fifty years; and, fortunately, the technology is now ripe. The following sections will provide, it is hoped, at least a very crude and simplistic outline of the paths that must be travelled to produce a fully-functional \VR\ system employing \Galn\ \anti ing. \newssect{Galpixels}{Galpixels and $\Gal{n}$ pixmaps} Historically, rasterised computer graphics came into existence as soon as solid-state memories of suitable capacity were able to be fabricated. It is therefore not difficult to guess the number of bits that were initially allocated to each pixel: one. Such technology was, for this reason, also referred to as \e{bitmapped graphics}: the bits in the memory device provided a rectangular ``map'' of the graphics to be displayed---which, however, could therefore only accomodate bi-level displays. As memory---and the processor power necessary to use it---became even more plentiful, rasterised display options ``fanned out'' in a number of ways. At one extreme, the additional memory could be used to simply improve the spatial resolution of the display, while maintaining its bitmapped nature. At the other extreme, the additional memory could be used exclusively to generate a multi-level response for each pixel position---for grey-scale, say, or a choice of colours---without increasing the resolution of the display at all; the resulting memory map, now no longer accurately described as a ``bit'' map, is preferentially referred to as a \e{pixmap}. In between these two extremes are a range of flexible alternatives; to this day, hardware devices often still provide a number of different ``video modes'' in which they can run. Increasing the memory availability yet further led, in the 1980s, to the widespread use of \e{$z$-buffers}, both in software and, increasingly, hardware implementations (whereby the ``depth'' of each object displayed on the display is stored along with its intensity or colour). We can see here already an extension to the concept of a pixel: not only do we store on--off information (as in bitmaps), nor simply intensity or colour shading information (as in early pixmaps), we also include additional, \e{non-displayed} information that assists in the image generation process. (Current display architectures also routinely store several more bits of \e{control} information for each pixel.) We now extend this concept of a ``generalised pixel'' still further, with our goal of \Galn\ \anti ing firmly in our sights. As well as storing the pixel shading, $z$-buffer and control information, we shall also store the \e{apparent velocity} of each pixel in the pixmap. We use the term \e{apparent motion} to describe the motion of objects in terms of display Cartesian \coord s: $x$ horizontal, increasing to the right; $y$ vertical, increasing as we move upwards; and $z$ normal to the display, increasing as we move out from the display towards our face. This motion will typically be related to the \e{physical motion} of the object (\ie\ its motion through the 3-space that the system is simulating) by perspective and rotational transformations; however, in section~\sect{Enhancements}, more sophisticated transformations are suggested between the apparent and physical spaces. Thus, for $z$-buffered displays (assumed true for the remainder of this \typeofdoc), three apparent velocity components must be stored for each pixel---one component for each of the $x$, $y$ and $z$ directions. The motional information stored with a pixel, however, need not be limited to simply its apparent velocity. In general, we are free to store as many instantaneous derivatives of the object's motion as we desire. The rate of change of velocity, the \e{acceleration} vector $\va$, is an obvious candidate. The \e{rate of change of acceleration}, which the physicist Rich\-ard~P.\ Feynman christened the \e{jerk}, may likewise be stored; as can the rate of change of jerk (for which the author knows no proposed name); the rate of change of the rate of change of jerk; and so on. We shall defer to the next section the process of deciding just how many such motional derivatives we should store with each pixel. For the moment, we shall simply refer to any pixmap containing motional information about its individual pixels as a \e{\Galn\ pixmap}, or \e{galpixmap}. The individual pixels within a galpixmap will be referred to as \e{\Galn\ pixels}, or \e{galpixels}. Of course, in situations where distinctions need \e{not} be made between these objects and their traditional counterparts, the additional prefix \e{gal-} may simply be omitted. Finally, it will be useful to have some shorthand way of denoting the highest order derivative of (apparent) motion that is stored within a particular galpixmap. To this end, we (tentatively) use the notation \e{$\Gal{n}\!$ pixmap}, or, more verbosely, \e{Galilean pixmap of order $n$}, where $n$ is the order of the highest time-derivative of the apparent position of each galpixel that is stored in the galpixmap. Thus, a conventional pixmap, which only records the position of each pixel (encoded by its position in the pixmap, together with its $z$-buffer information), and \e{no} higher time derivatives, may be described as a $\Gal{0}$ pixmap. Galpixmaps that store velocity information as well are $\Gal{1}$ pixmaps; those that additionally store acceleration information are $\Gal{2}$ pixmaps; and so on. As will be seen shortly, additional pieces of information, over and above mere motional derivatives, will also be required in order to effectively carry out \Galn\ \anti ing in practical situations. Although the amount of information thus encoded may vary from implementation to implementation, we shall not at this stage propose any notation to describe it; if indeed necessary, such notation will evolve naturally in the most appropriate way. \newssect{GalpixmapStructure}{Selecting a Suitable Galpixmap Structure} We now turn to the question of determining \e{how much} additional information should be stored with a galpixmap, in order to maximise the overall improvement in visual capabilities of the system that are perceived by the viewer. Such questions are only satisfactorily answered by considering \e{psychological} and \e{technological} factors in equal proportions. That a purely technological approach fails dismally is simply shown: consider the sample-and-hold video controller philosophy described in section~\ssect{CurrentRasters}, as (successfully) applied to static objects on the display. We noted there that the video controller effectively boosted the perceived information rate of the display from the \e{update} rate up to the \e{refresh} rate, simply by repeatedly showing the same image. Shannon's information theory, however, tells us that this procedure \e{does not}, in fact, increase the information rate one bit: the repeated frames contain no new information---as, indeed, can be recognised by noting that the viewer could, if she wanted to, reconstruct these replicated frames ``by hand'' even if they were not shown. Thus, even though we \e{know} that frame-buffered rasterised displays ``look better'' than display systems without such buffers (\eg\ vector displays), information theory tells us that, in a raw mathematical sense, the frame buffer itself doesn't do anything at all---a fact that must be somewhat ironically amusing to at least one of Shannon's former PhD students. Raw mathematics, therefore, does not seem to be answering the questions we are asking. A better way to view this \e{apparent} increase in information rate is to examine the viewer's subconscious prejudices about what her eyes see. She may not, in fact, even realise that the display processor \e{is} only generating one update every so often: to her, each frame looks just as fair dinkum as any other. All of this visual information---a static image---is simply preprocessed by her visual system, and compared against both ``hard-wired'' and ``learnt'' consistency checks. Is a static image a reasonable thing to see? Did I really see that? Was I perhaps blinking at the time? Am I moving or am I stationary? What do I \e{expect} to see? It is the lightning-fast evaluation of these types of question that ultimately determines the ``information'' that is abstracted from the scene and passed along for further cogitation. In the case described, assuming (say) a stationary viewer sitting in front of a fixed monitor, all of the consistency checks balance: there appears to be a fair-dinkum object sitting in front of her. In other words, the display is providing sufficient information for her brain to conclude that the images seen are consistent with what would be seen if a real object were sitting there and reflecting photons through a transparent medium in the normal way; that is all that ultimately registers. We now turn again to our litmus test: an object with a \e{uniform apparent velocity} being depicted on the display. Using a $\Gal{0}$ display, such as described in section~\ssect{CurrentRasters}, results in the motion depicted in equation \eq{SampleAndHoldConstV}. What does the viewer's visual system say now? Is that an object that keeps disappearing and popping up somewhere else? Is it really something moving so jerkily? Why doesn't it move like any animal I've ever seen before? The answers to these questions depend on just how slow the update rate is, the context that the images are presented in, and, most likely, the past experiences of the viewer. Let us assume, however, that her brain \e{does} decide that the scene is, in fact, depicting a single object in motion, rather than the spontaneous and repetitive destruction and creation of similar-looking objects. Immediately after this decision is reached her visual processing system performs a hard-wired \Galn\ transformation to that part of the scene in which the object moves, such as described in section~\ssect{GalAnti}. Why? Because as animals we learnt the hard way that it isn't enough to simply know whether something is moving as a whole---one also needs to know what the motion of the \e{parts} of the object are. Is that human walking towards us with arms swinging by its sides, or with arms outstretched ready to throttle us? The \Galn\ transformation \eq{GalXfn} removes the (already-decided-on) uniform motion of the object, to let the viewer then determine the relative motion of its constituent parts. The result, as we have already shown in section~\ssect{GalAnti}, is a weird-looking saw-tooth motion, involving spontaneous teleportations every time the frame buffer is updated. Does this look like any animal we have ever seen? No, not really. OK, then, maybe we didn't see it too well? Yes---that must be it---I probably didn't see it properly. How well this rationalisation can be tolerated depends on how low the update rate is: seeing \e{is} believing---but only if you see it for at least 100 milliseconds. Let us now assume that the display system is not a $\Gal{0}$ device at all, but is rather one of the freshly-unpacked $\Gal{1}$ models. At some initial time, the object appears on the display; the frame buffer has been updated to show that it is there. (Ignore, for the moment, this instantaneous birth.) One frame later, the display processor is still busy redrawing things; the video controller must decide itself what to do with the image. Firstly, it clears a \e{third} frame buffer (in addition to the one that it has just finished scanning from, and the one that the display processor is talking to), into which it is going to generate a new image. Secondly, it goes through the entire frame buffer that it has just displayed, galpixel by galpixel. At each pixel, it retrieves the velocity information for that galpixel. It then adds this velocity (measured in pixels per frame) to the current position, to find out where that galpixel would be one frame later. It then writes this information into the new frame buffer at the appropriate position; and repeats the process for all the galpixels in the original frame buffer. Thirdly, it worries a bit about those galpixels in the new frame buffer that didn't get written to; let us ignore these worries for the moment, and just leave the ``background'' colour in those pixels. Finally, it scans the new frame buffer onto the display device. Ignore, for the moment, that the procedure described seems to double the amount of time for the video controller to do its work. (A moment's reflection reveals that, in any case, there is no fundamental reason why the new-frame-drawing procedure cannot occur at the same time that the \e{previous} frame is being scanned to the display device.) What will the viewer think that she is seeing? Well, the object will clearly jump a small distance each frame---with each jump exactly the same size as the last (at least, to the nearest pixel), until the new update is available. If the object really \e{is} travelling with constant apparent velocity (our assumption so far), then upon receipt of the new image update, the object will jump \e{the same} small distance from the last (video-controller-generated) frame as it has been jumping in the mean time (assuming focus-switching is appropriately synchronised, of course). Now, the \e{refresh} rate of the system is assumed to be significantly faster than the visual system's temporal resolution; therefore, the motion will look like convincingly like uniform motion. Uniform motion has been Galilean antialiased! Let us examine, now, what ``residual'' motion we are left with when this uniform motion is ``subtracted off'', via a \Galn\ transformation, from the perceived motion. We now---thankfully---do not end up with the horrific expression \eq{SpuriousConstV}, but rather with an expression that is \e{almost} zero. In the setup described the error is not \e{precisely} zero---if the apparent velocity of the object does not happen to be some integral number of pixels per frame, then the best we can do is move the pixel to the ``closest'' computed position---leading to a small pseudo-random saw-tooth-like error function in space-time, \ie\ we are hitting the fundamental physical limits of our display system. However, the fact that the \e{amplitude} of the error is at most one pixel in the spatial direction, and one frame in the temporal direction, means that it is a vastly less obtrusive form of antialiasing than the gross behaviour described by \eq{SpuriousConstV}. (If so desired, however, even this small amount of spatio-temporal aliasing can be removed with suitable trickery in the video controller; but we shall not worry about such enhancements in this \typeofdoc.) Having successfully convinced the viewer of near-perfect constant motion, let us now worry about what happens if the object in question is, in fact, being \e{accelerated} (in terms of display \coord s), rather than moving with constant velocity. For simplicity, let us assume that the object is undergoing \e{uniform acceleration}. Fortunately, such a situation is familiar to us all: excluding air resistance, all objects near the surface of the earth ``fall'' by accelerating (``by the force of gravity'', in 19th century terminology) at the same constant rate. How do our various display systems cope with this situation? Let us assume that the object in question is initially stationary, positioned near the ``top'' of the display. Let us further assume that the acceleration has the value 2~pixels per frame per frame. Firstly, let us consider the optimal situation: the display processor is sufficiently fast to update the object each frame. Clearly, if we shift our axes in such a way that $y=0$ corresponds to the initial position of the object, its vertical position in successive frames will be given by \beqn{AccelBest} -y=0,1,4,9,16,25,36,49,64,81,\ldots, \eeqn as is verified from the formula $y=y_0+v_0t+\half at^2$, where in this case $y_0=0$, $v_0=0$ and $a=-2$, and $t$ is measured in frame periods. Since this motion is depicted at the \e{refresh} rate---which, by our assumptions, is sufficiently fast that our visual system does not perceive any inherent temporal aliasing---the object should look a real object falling, without air resistance, to the ground. (We defer a more complete discussion of \e{motion blur} to another place.) Let us now examine how this object is depicted on a $\Gal{0}$ display device, such as described in section~\ssect{CurrentRasters}. For simplicity, assume that the display processor takes precisely \e{two} frame periods to render each completed image. Clearly, the sample-and-hold nature of the video controller will simply yield the positions \beqn{AccelG0Display} -y=0,0,4,4,16,16,36,36,64,64,\ldots. \eeqn The error in each frame can be obtained by simply subtracting from the values in \eq{AccelG0Display} their counterparts in \eq{AccelBest}, yielding \[ \D y_\txt{error}=0,1,0,5,0,9,0,13,0,17,\dots. \] It is apparent that the error gets worse as the object accelerates: it does not simply ``ramp'' between two bounds as was the case for uniform velocity. Whether or not this can be recognised by our viewer as smooth motion or not depends on the resolution of the device. However, it should be noted that the \e{psychological} mismatches that are accumulating here are of a worse nature than for simply uniform motion. The reason for this is that, firstly, the viewer's visual preprocessing system must decide, at each point in time, if the object on the display really is moving at all---\ie, whether it has a \e{velocity}, such as described above. Secondly, her visual processing system must then subconsciously determine whether the object is in fact \e{accelerating}. How do we know that she cares about acceleration? \e{Toss a tennis ball her way.} The fact that humans can catch projectiles moving under the acceleration of gravity shows that we are capable (by some mechanism) of mentally computing the effects of acceleration. This is not surprising, given that everything on the surface of the earth not held up by something else accelerates downwards at a constant rate. Whether \e{space}-born and -bred humans would develop their visual systems in the same way, or whether they would ``evolve'' to reduce the psychological importance of acceleration in their thinking, is a question that is beyond the author's reckoning; but it is nevertheless a hypothetical (or, perhaps in time, a not-so-hypothetical) question that emphasises the all-important fact that \e{the way we perceive the world depends greatly on how we are used to seeing it behave}. Let us now return to the case of the falling object, and determine how its motion will be depicted on the $\Gal{1}$ display device that served our purposes so admirably in our uniform-velocity thought experiment above. Again, assume that the display processor updates the image only once every two frames. On each update, the \e{velocity} of each galpixel must also computed; for the object in question, the formula $v=v_0+at$, with $v_0=0$, yields the velocity for each successive frame: \beqn{AccelG1Vel} -v_\txt{stored}=0,0,4,4,8,8,12,12,16,16,\ldots. \eeqn We have here copied the velocity from every even-numbered frame (the ones being updated) to the odd-numbered frames: the video controller, having no better information, simply continues to assume that the velocity of the galpixel is constant until the next update arrives, and thus copies this information from frame to frame as it copies the galpixel. In the current example, this ``propagated'' velocity (\ie\ the velocity that is assumed constant from frame to frame) is not actually used to compute anything (as the video controller only fills in \e{one} frame itself after each update), but we shall shortly examine a case in which it \e{is} used (namely, when the display processing takes longer than two frames to generate each update). It is straightforward to compute how the $\Gal{1}$ display system will depict our uniformly-accelerated object's motion: using the velocities in \eq{AccelG1Vel} to extrapolate the object's \e{position} for every odd frame, we obtain \[ -y=0,0,4,8,16,24,36,48,64,80,\ldots. \] Subtracting from this the sequence of \e{correct} positions, \eq{AccelBest}, we find that \[ \D y_\txt{error}=0,1,0,1,0,1,0,1,0,1,\dots. \] Obviously, we have here a vast improvement over the $\Gal{0}$ display: the errors, while not zero, are at least \e{bounded}. Let us now consider making these last two examples just a tad more realistic. Keeping the other parameters constant, let us assume that the display processor is now only fast enough to generate one update every \e{three} frames. Our dust-gathering $\Gal{0}$ display system will successively show the object to be at the positions \[ -y=0,0,0,9,9,9,36,36,36,81,\ldots. \] The errors represented by these positions are, respectively, \[ \D y_\txt{error}=0,1,4,0,7,16,0,13,28,0,\dots. \] Obviously, the errors are worse than even the horrible performance with two frames per update. Let us, therefore, turn our attention immediately away from these horrible figures (in the best spirit of politicians), and consider instead our now-worn-in $\Gal{1}$ video controller. The velocity values stored with the galpixmap, as copied across by the video controller where necessary, will now be given by \[ -v_\txt{stored}=0,0,0,6,6,6,12,12,12,18,\ldots. \] The position values used to display the object, as determined by the video controller, can then be computed to be \beqn{AccelG1Pos3} -y=0,0,0,9,15,21,36,48,60,81,\ldots, \eeqn which represent errors of \[ \D y_\txt{error}=0,1,4,0,1,4,0,1,4,0\dots. \] We now see the precise capabilities of a $\Gal{1}$ system when dealing with a uniformly-accelerated object. In between updates, the positional error increases \e{quadratically}---\ie\ in a parabolic shape. Once the display processor provides an update, the positional error returns (as it always does) to zero. It then proceeds to increases quadratically to the \e{same} maximum error as before (not to ever-worse values, as is the case for a $\Gal{0}$ system); and repeats the cycle. This bounded-error feature of a $\Gal{1}$ display system with uniformly \e{accelerated} motion resembles, to some extent, the (bounded) positional errors associated with a $\Gal{0}$ system for an object uniform \e{velocity} (apart from the change of shape from ramp to parabola, of course). This is not very surprising when one considers that, in both of these cases, the video controller ``knows'' about all of the non-zero temporal derivatives of the object's motion save for the highest order one. On the other hand, this raises a worrying problem: in general, the apparent motion of an object on the display will be an arbitrary analytical function of time (excluding, for the moment, object ``teleporting'', and any non-physical exotic mathematical functions introduced by a devious programmer). The problem is that an arbitrary analytical function has an \e{infinite number} of non-zero positive-exponent Taylor series coefficients, \ie\ one needs to know \e{all} of its derivatives to extrapolate its motion indefinitely. How can we possibly justify going on to $\Gal{2}$ systems, then $\Gal{3}$ systems, then $\Gal{4}$ systems, \e{et cetera ad infinitum}? Are we doomed? Clearly, this \typeofdoc\ would not be in public circulation were this problem a real one, rather than a case of mathematics-gone-wild. Already, we have seen that using a $\Gal{1}$ display system vastly improves upon the performance of a $\Gal{0}$ system, even if it cannot \e{fully} extrapolate accelerated motion. If all we are interested in is \e{some} low-cost performance improvement, rather than full mathematical rigour in extrapolation, then we already know how to obtain it. However, we shall shortly see that we can do better than this: there is clearly a point of diminishing returns beyond which it is not worthwhile going to the next $\Gal{n+1}$ system. To justify this statement, however, we must first remove our attention from the abtract world of mathematics, and focus again on the most important component in our physical system: \e{the participant}. We first return to the above thought experiment of using a $\Gal{1}$ display system for uniformly accelerated motion. We earlier obtained results for the \e{positional} error of the display (the repeated-parabola): what about the \e{velocity} error? To assess this, we must first invent some crude model for the way in which the viewer's brain computes velocities in the first place. The simplest suggestion for such a model, based (perhaps very erroneously) on traditional mathematical methods, might be that the visual system simply takes \e{differences in positions} between ``brain ticks''---or, here, between frames---to compute an average velocity for each ``tick''. Admitting, for the moment, that this model is not too outrageous, we can proceed to compute the values that the viewer's brain would compute, using such an algorithm, when viewing the $\Gal{1}$ display results in \eq{AccelG1Pos3}. Taking these differences between positions at adjacent times, we obtain \beqn{ComputeVel} -v=0,0,9,6,6,15,12,12,21,\ldots. \eeqn The problem is, however, the following: how do we ascribe a particular \e{time} to each of these values, being as they are differences between \e{two} adjacent times? For example, if we take the difference $y(5)-y(4)$, does this velocity ``belong'' to $t=4$ or $t=5$? The least complicated choice is to simply decide that this velocity evaluation ``belongs'' to $t=4.5$---``split the difference'', as it were. With this ansatz, the sequence of values \eq{ComputeVel} corresponds to the perceived velocity computed at the times $t=0.5,1.5,2.5,3.5,\ldots$. The \e{correct} velocity values, on the other hand, evaluated at these particular times, are simply computed via $v=v_0+at$ as \beqn{ActualVel} -v=1,3,5,7,9,11,13,15,17,\ldots. \eeqn Taking the difference between \eq{ActualVel} and \eq{ComputeVel} thus yields the errors in the computed velocities: \[ \D v_\txt{error}=-1,-3,+4,-1,-3,+4,-1,-3,+4,\ldots. \] Now, this is the first error sequence that we have encountered that has had \e{both positive and negative values}. In fact, it is clear that the \e{time-average} of this sequence is zero: $(-1)+(-3)+(+4)=0$. In other words (as would have already been blindingly obvious to anyone who has ever designed a feedback-control system), using $\Gal{1}$ antialiasing results in the correct \e{average} velocity being depicted for an object (where we average over a complete update period), \e{even if the motion itself contains higher derivatives}. It is this observation that will tell us when to stop increasing the $n$ in $\Gal{n}$. Let us now entertain the notion that we could, at this stage, be forever more happy with our $\Gal{1}$ display technology. Excluding the fact, for the moment, that modern computer equipment already has a half-life shorter than most pairs of socks, would the general \e{principles} of $\Gal{1}$ antialiasing be such that we would never desire to venture to any $\Gal{n}$ for $n>1$? That this is not an unreasonable notion is supported by the simple fact that $\Gal{0}$ technology has been widespread for about twenty years, and \noone\ seems to be complaining about \e{it}. (Yet.) On the other hand, since this section does not end at the end of this sentence, the reader is no doubt anticipating that the story of our hypothetical viewer is not yet complete. The main problem with our simplistic examples above, which will show why they are not completely representative of the real world, is that they are all \e{one-dimensional}. Of course, we can tranform the ``falling'' example to a ``projectile'' situation by simply superimposing an initial horizontal velocity on the vertical motion; this would simply require applying the uniform motion and uniform acceleration cases cojointly. However, we have also simplified the world by only talking about some unspecified, featureless, Newtonian-billiard-ball-like ``object'', without worrying about the extended three-dimensional structure of the object. A 2-D world, of course, is infinitely more interesting than a 1-D world, not least because it becomes possible to step \e{around} other people, instead of simply bouncing into them. In a ``$2\half$-D'' system---such as in many video games, or Microsoft Windows---\e{three}-dimensional structure is represented by painting the object with depth-emulating visual clues; but the objects still only move, effectively, in two dimensions. On the other hand, in a manifestly three-dimensional environment such as \VR, the two-dimensional display is a subtle \e{perspective projection} of the virtual three-space. Objects in such virtual worlds, unlike their 2-D or $2\half$-D counterparts, are free to rotate about arbitrary axes, as well as to move closer to or further from the participant. The \e{apparent} motion of the objects on the display device is a complicated interplay between the physical motion of the objects, the physical motion of the participant, and the mathematical transformations yielding the perspective view. It is important that we consider more general forms of motion, such as this, before drawing any general conclusions about whether $\Gal{1}$ antialiasing is good enough for practical purposes. % % ...File 2 of 4 should be concatenated here ...