From: burns@latcs1.lat.OZ.AU (Jonathan Burns)
Subject: TECH: REND386: adding motion
Date: Sun, 9 Feb 1992 05:18:34 GMT
Organization: Comp Sci, La Trobe Uni, Australia


In article <1992Feb4.000105.29555@watserv1.waterloo.edu>
dstamp@watserv1.waterloo.edu (Dave Stampe-Psy+Eng) writes:

> I thought I'd put before you the next step in the REND386 saga before it's
> begun, in the hopes of sparking some feedback and discussion.  Basically,
> it involves moving and animating objects, either autonomously or under
> direct user input.

[ Dave's post dissected below.. ]

Whatever the response to this, it certainly helps _me_.

I'm in MSc in computer science here, going on to PhD. The bee in my
bonnet for the last year or so, has been a language in which coordinate
systems are first-class objects. This leads directly to tree-structured
articulated systems like yours.

Precis:

X is a numeric array (of floats).
T is a transform, X->X, or a parametrized transform (e.g translate,
rotate), X X->X. Preferably invertible.
F is a frame or coordinate system; either primitive, or related to
a prior frame by some transform.
P is a point-set, given as a pair (F X).

Expression syntax is:

X := T X or F P
T := Monadic_T  or  X Dyadic_T or
	T inverse or T T
F := T F
P := F X

Reduction of expressions is based on an identity, T F T X = F X,
which expresses coordinate transformation between frames.

Assuming a given frame can only be defined once, the frames will
be vertices of a collection of trees, whose edges are transforms.

It then turns out that you need just two operational reductions
to evaluate expressions: TX evaluation applies a transform, FF
evaluation traverses the frame tree to give a chain of transforms. 
There is also TT evaluation, compounding transforms before application;
but you can of course just apply the T's one after the other.


This is overly theoretical and general for applications, I know;
but I'm keeping it on a functional-programming level, so as to
work through the issues of functional vs. assignment languages,
variable scope, data structure etc.

I've implemented this within the J language (succesor to APL);
it gives me array operations that I want. Platform, a 386 AT
without coprocessor. Predictably it runs like a slug, but I
can say:

> o = 0 0 0
> i = 100 0 0
> j = 0 100 0
> k = 0 0 100

> v_p = 200 200 0     (vanishing point)
> focus = 200

And then with transforms
tr (translate)
qx (90 degree turn around the first axis)
cp (central projection)

and base frame
screen     (x = horizontal left-to-right, y = vertical top-to-bottom,
	z = into screen)

> world = v_p tr inv focus cp v_p tr qx _400 0 0 tr screen
  -----   ---------- -------- ------ -- ----------- ------
  F       T          T        T      T  T            F

> cube = world  k tr * j tr * i tr ! o
  ----   -----  ----   ----   ----   -
  P      F      T      T      T      X

> draw cube

And there's my cube, in 9 lines of code (eliding some residual J
punctuation).


OK. A few words on where this has got me, and where it hasn't.

(1) Evidently the frame tree is a powerful data structure. If
you have the parameters late-evaluated, you get the kind of
articulated structure you've been talking about. I'm building
a spreadsheet interface around it all, in which a formula such
as that for frame "world" spawns cells for its parameters (v_p,
focus), which will be available for text editing or mouse input.

(2) But this is necessarily slow, because of the late evaluations,
and because of the successive application of transforms whenever
there is new parameter input. It can be made faster by caching
recent calculations, and by pre-compounding transforms; you have
one strategy for this.

(3) Frames are charmingly multi-purpose. A frame is a turtle; also
a camera; also an extended geometric object (e.g a set of orthonormal
planes). Frame _trees_ are even better; for instance one could set
up a binary-space-partition tree as a frame tree.

(4) There is a lot of scope for tree- and graph-processinng operations.
To get these, I have to work through indexing operators, traversal
operators, etc. I don't have these defined to my satisfaction, and
J is proving something of a disappointment when it comes to easy
prototyping.

(5) Pixar's Renderman Toolkit has the basics built in; but ties them
to an assignment-language C interface. I can imagine bolting on
some array operation classes, for instance Dyad's M++ library; and
I may well go this way for my next implementation.

(6) I've had some exposure to NeXT's lately. On the side, I've been
fooling around with the idea of embedding the XTFP types in PostScript
for portability. But it's hard to resist a sense that Renderman is
shaping up as a 3D standard, and may be a superior platform. 
 

Some comments on your post...

> REND386 uses a intermediate world representation of objects: polygon vertices
> are converted into world coordinates when objects are placed or moved, then
> the renderer uses these positions to compute visibility etc.  This is more
> efficient when most objects are static, as compared to each object having
> a unique transform matrix which must be recomputed (or multiplied by the 
> viewpoint matrix) whenever the screen is redrawn, even if just because of
> a viewpoint shift. 

And thus, however complicated the articulation, you only need recompute
world-to-screen for each object. Good.

> However, this relies on having a fast method for moving an object (translating
> from object to world coordinates).  This must be assemblyized for speed.
> The basic operations to be performed are a rotate around the object's
> origin (x=0,y=0,z=0) and a translation to its position in world coordinates.
> This can be done in about 20-23 uS/vertex.

The tree-structured articulation helps partition the work here. Only
those subtrees in which given degrees of freedom participate have to
be re-calculated to world coordinates.

> Now all this is OK, but Bernie and I would like to add some structure
> to motion in the world, especially in terms of objects that can change
> shape (walk, grasp, crawl etc)-- animated "actors", if you will.  The
> method we are looking at could be called "armature animation".  Each
> figure has a virtual framework with joints that can be set to any
> angle-- by specifying joint angles, you can specify gestures across
> a wide range of figure sizes and "topologies".  Joints can also slide
> along the armature for other kinds of effects.

I look at it this way. A compound object, like my cube, has input
parameters. If I index the vertices, then any vertex, matched against
any frame (say, a wall represented by an xy-plane) yields coordinate
outputs. I could manipulate the cube directly, say by changing the
i or j parameter. But I could also recompute i or j as functions of
the relative-to-wall coordinates, then recompute the latter, and so
on iteratively until some condition is met, say that a given vertex
is in contact with the wall.

_In principle_, a complex of actions like walking could be thought
of as a nerve ganglion which selects functions to recalculate the
degrees of freedom until it is satisfied (critter heading the right
way, centre of gravity over the base, etc).

This is a lot of work. But the advantage is that once it is done
for an armature tree, that can be grafted onto a bigger tree.
Essentially I see the armature (or frame tree) as an object, with
input and output methods, and internal methods for iterating between
input and output instance variables.

A naive instance has inputs only. But graft it onto a parent frame,
and outputs become available. A feedback "ganglion" can then be
built. Define a subclass of the tree class which has a slot
for a wall, and the ganglion can be installed in the subclass as
a method.

> Of course, the "armature" doesn't actually exist, but is represented in
> a tree data structure.  Say the root is the "body" object, next is the
> upper torso object, the upper arm object, lower arm object, hand object,
> and 3 levels of finger segment objects.  Moving a finger joint does not
> affect the upper joints, while a gesture can be performed independantly of
> the position of the rest of the arm.  This is a very compact and useful 
> way to represent object motion relationships.

I find this suggestive of locomotor neural structure, in which
an animal has low-level routines for "shift weight to left foot",
"drive right thigh as pendulum", etc, and these are triggered by
high-level routines like "turn left, prepare for new line of march".


> The proposed method for computing a new object position is to first
> change joint angles as needed and mark as moved.  Then traverse the
> tree of object segments, first positioning and moving the body from
> object space to its proper place in the world.  As a result of this,
> joint points that connect to other segments will be moved also.  And the
> matrix that expresses the rotational "attitude" of the body can be carried
> through to the rotation of the subsegments too.

> For each subsegment, then, a rotation matrix is computed that is the product
> of the joint's rotation angle matrix and the "parent" segment's rotation 
> matrix.  The vertices of the object are then rotated by this matrix and 
> translated to the proper position on the figure by adding to them the
> world coordinates of the joint on the parent segment.  This minimizes the
> total calculations  done, as each vertex is translated and rotated once.

In other words, propagate rotation matrices up the tree. Typically,
won't these be homogeneous transform matrices though? My typical
robot tends to go Constant-translation-up-link, Variable-rotation-
about-joint-axis.

> The "armature" of the segment then consists of a line joining the segment
> object's origin to the next segment's joint position (which is part of
> the segment object's definition, and is handled just like a polygon
> vertex).  This puts the onus of armature design onto the object creator,
> where it belongs.  A "mannequin" world be designed as a set of segment
> objects with each in its rest (zero angle)  position and the object's
> origin at the joint where it connects to the parent object.

Same here; except that I want to play with a broader set of primitive
transforms for some purposes. Like drop-a-normal-to-a-Bezier-patch.
(Don't ask me how I'll deal with inverses, not yet :-)

> I realize this may seem a bit unclear, but it's in "middle" planning
> stages.  Not that difficult to program, but the demo must be done
> from scratch.  However, the potential advantages are obvious, I think.
> The problem of animated figures, bodies, Gloves, etc. become much
> simpler once there's a standard to follow in terms of how to do it.

Clear as crystal. It pleased me immensely when I realized that the
same frames I was using as nodes in chains of transforms could also
return booleans, sort on coordinates, and probably more.

If I can get my trees to behave, I'll send you any useful articulations
I produce.

On standards: I once asked a fellow in the know (Mike Kaplan) whether
there was not by now a standard constructive solid geometry description,
Class CSG so to speak. He said it had been tried, but the number of
possible descriptions was combinatorially huge. There would appear to
be a big gulf between articulated descriptions and naturalistic bodies.

> Eventually, this could lead to a complete set of interface guidelines
> to network distant systems together in a shared VR world (This is
> Bernie's hobbyhorse, and I've taken many a ride on it.)

I like it. Tally ho!


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jonathan Burns        |  They finally discovered that the ONE thing
burns@latcs1.lat.oz.au|  they just could not _STAND_ was a smartass
Computer Science Dept |  commenting decompiler ...
La Trobe University   |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~