.\"
.\" troff -ms % | lpr
.\"
.\" revision date - change whenever this file is edited
.ds RD 14 July 1993
.nr PO 1.2i	\" page offset 1.2 inches
.nr PD .7v	\" inter-paragraph distance
.\"
.EH 'RTF Miscellanea'- % -''
.OH ''- % -'RTF Miscellanea'
.OF 'Revision date:\0\0\*(RD''Printed:\0\0\n(dy \*(MO 19\n(yr'
.EF 'Revision date:\0\0\*(RD''Printed:\0\0\n(dy \*(MO 19\n(yr'
.\"
.\" subscript strings
.ds < \s-2\v'.4m'
.ds > \v'-.4m'\s+2
.\"
.\" I - italic font (taken from -ms and changed)
.de I
.nr PQ \\n(.f
.if t \&\\$3\\f2\\$1\\fP\&\\$2
.if n .if \\n(.$=1 \&\\$1
.if n .if \\n(.$>1 \&\\$1\c
.if n .if \\n(.$>1 \&\\$2
..
.TL
RTF Miscellany
.AU
Paul DuBois
dubois@primate.wisc.edu
.AI
Wisconsin Regional Primate Research Center
Revision date:\0\0\*(RD
.SH
Introduction
.LP
This document contains a few scribblings about things which don't seem to be
covered in the RTF specification.
Any or all conjectures here may be false; if so, I'd like to know about it.
Counterexamples or references to corrections would be appreciated, as my
conclusions are based on observation.
.LP
Nomenclature:
.DS
WfM	Word for Macintosh
WfW	Word for Windows
.DE
.SH
Page Orientation
.LP
WfM and WfW write the paper width and height correctly,
but they don't write ``\elandscape'' into landscape documents.
I consider this a bug.
Another bug is that if you read such an RTF document back into WfM,
it sets the page orientation back to portrait.
The same might be true of WfW.
.SH
Tab Handling
.LP
When a new group begins (with ``{''), it inherits the tab stops of the group
within which it occurs.
However, if any tabs are explicitly set within the group, they
override the inherited set.
.LP
The same is true with regard to tabs stops that a group may get as a
result of a style setting.
If tabs are set within the style, they override inherited tabs.
Again, however, if tabs are explicitly set within the group, they
override not only any inherited tabs, but any tabs that may have been
set within the style.
.LP
For translator purposes, the upshot is that it's necessary to know
when a style is being expanded (so you know to override inherited
tabs), and when the expansion is done (so you know when to override
style tabs).
.LP
Tabs may be associated with a leader character, and may have a
justification attribute.
I have seen an RTF specification (non-Microsoft) which claimed that
justifications apply to the last specified tab position.
My experience is that the opposite is true.
(The Microsoft spec seems to be silent on this point.)
The leader character does indeed apply to following tab positions,
but the justification attribute applies to the next tab position
specified.
If no justification is given, the tab defaults to left-justified.
.LP
For translators, this means that if a justification indicator occurs, you
apply it when the next tabstop position is given, otherwise the tabstop
is left-justified.
.SH
Styles
.LP
The ``Normal'' style definition does not seem to ever include a style
number within it, unlike all others.
The Microsoft specification indicates that the Normal style is numbered
as style 222, although why the value 222 was chosen is beyond me.
.LP
Style 0 seems to have a special use.
``\esbasedon0'' appears to mean a style is based on the Normal style.
``\esnext0'' appears to mean the style is its own next style.
Oddly enough, Normal style definitions I have seen contain ``\esbasedon222''
rather than ``\esbasedon0''.
.LP
Stylesheet entries may leave out the ``\esbasedon'' control word
(WfW does this).
In this case, the default should be 222.
I've only seen this in Normal style entries.
wI have not seen any documents where the ``\esnext'' control word is
left out, but if it is, the style is its own next style.
.SH
Restoring Defaults
.LP
Apparently, when ``\epard'' is encountered, the way to restore paragraph
defaults is to restore not only all the static initial paragraph formatting
values, but also to apply the Normal style.
.LP
The control word ``\eplain'' is much like ``\epard'' but for characters.
Why it's not ``\echard'' I don't know, but the effect is not only to restore
character style to plain, but also default font size, expansion value, etc.
.SH
Special Characters
.LP
Different character sets may be specified in different
RTF documents (e.g., ``\eansi'',
``\emac''), and characters within one set may have no representation in
another set.
For instance, the Apple character in the Macintosh character set has
no counterpart in the ANSI character set.
This constitutes a failure of RTF to provide machine independence.
.SH
Fonts
.LP
The default font (specified with ``\edeff'') need not be included in
the font table.
Don't assume it will be there.
.LP
The font name for a given font may vary among systems.
``Times Roman'' in WfM is ``Tms Rmn'' in WfW.
The font family may also differ.
WfM associates ``\eftech'' with the Symbol font, whereas WfW associates
it with ``\efdecor''.
Another failure of machine independence.
.LP
Presumably the font name differences are related to the method (or
lack of it) provided by underlying system software for referring to
fonts from within programs.
Translators, which may be writing output for target systems having an
entirely different set of conventions, should provide their own
mechanism for mapping RTF font names onto the fonts that are
available.
.LP
It is instructive to observe that WfM and WfW do not do particularly
well at figuring out the fonts used in an RTF document created by each other.
.SH
Tables
.LP
The Microsoft RTF specification says very little about the constraints on
the order in which table formatting control words may appear.
The inferences below are based on inspection of several RTF tables, but as
this is an inductive process, it's hard to say whether they are generally
true, or only true of those files at which I've looked.
.LP
Cells are like tabs in that a cell may have a position and other 
attributes.
The position applies to the right edge.
The other attributes, if specified, occur
.I before
the position specifier.
.LP
In the table layout information at the beginning of the table,
cell border control words are
.I followed
immediately by a border type control word.
This is one place where a translator may want to read ahead in the input
stream to get the border type.
It appears that more than one border type word may follow the cell border
control word, ``\eclbrdrb\ebrdrsh\ebrdrs''.
.LP
It appears that (i) ``\etrowd'' occurs as the first control word of a table;
(ii) the row ends with ``\erow'';
(iii) everything between specifies the format and content of the row.
It further appears that the table ``state'' is completely independent of
the grouping level.
.LP
Cells begin with the
.I first
``\eintbl'' and after
.I each
``\ecell'' control word.
Cells end after each ``\ecell'' and ``\erow'' control word.
Cells can
.I not
be assumed to begin with every ``\eintbl'' since every paragraph within
a table cell begins with that control word.
.LP
Tabs specified within a cell are relative to the left edge of the cell,
not the left margin of the page.
.LP
WfM seems to write an empty cell at the end of each row.
For instance, if there are three cells, there will be three token sequences,
each ending with ``\ecell'', then another sequence (typically ``\epard
\eintbl'') ending with ``\erow''.
This probably corresponds to the ghost column you see at the end of tables
when you select them or show formatting codes.
