======================================================================
Unicode 1.0.1 Addendum                                      92.11.03   8:52


                         UNICODE 1.0.1

The following document is an ASCII version of the Unicode 1.0.1
addendum, which has been added to Volumes 1 and 2 of The Unicode Standard.
Because the formatting has been lost and the original text contains non-
ASCII characters, a dollar sign is used as a placeholder instead, and
the text has been modified slightly for readability.

Printed copies of the addendum will be sent to Unicode corporate,
associate and individual members. Others may get a printed copy by
sending a stamped, self-addressed envelope to the Unicode Consortium
at the address below, or may get a fax copy on request. Copies of the
ASCII version of this document can also be obtained by anonymous FTP
from Unicode.Org.

________________________________________________________________________

Recipient is granted the right to make copies in any form for internal 
distribution and to freely use the information supplied for the purposes of 
creating and implementing products that comply with the Unicode Standard.

The authors and publishers have taken care in preparation of this work, but 
make no expressed or implied warranty of any kind and assume no responsibility 
for errors or omissions. No liability is assumed for incidental or 
consequential damages in connection with or arising out of the use of the 
information or programs contained herein.

Copyright (c) 1991-1992, Unicode, Inc. All Rights reserved. Unicode (tm) is a 
registered trademark of Unicode, Inc.

________________________________________________________________________

1. Introduction

As discussed in Volumes 1 and 2, small changes have been made to Unicode 
1.0 in order to incorporate it into the international character encoding 
standard, ISO 10646, which was approved by ISO as an International 
Standard in June, 1992. The Unicode Consortium plans to issue Unicode 
1.1 in early 1993. The character content and encoding will be identical 
to that of ISO 10646. To that end, Unicode 1.1 will include 
approximately 5,400 additional characters from ISO 10646 that are not 
already in Unicode 1.0.

In order to expedite use of Unicode in the interim, the Unicode 
Consortium is issuing an intermediate version, Unicode 1.0.1, which 
consists of Unicode 1.0 modified by the changes necessary to make the 
character codes a proper subset of ISO 10646. 

This paper describes the differences between Unicode 1.0.1 and Unicode 
1.0 (for more information, see Volume 1, pp. xix-xx and Volume 2, pp.
4-9 and 427-431). Implementations that use Unicode 1.0.1 as thus defined 
will be completely compatible with Unicode 1.1, and therefore fully 
compatible with ISO 10646.

Mapping of Unicode characters to the national and industry standards 
will be finalized in Unicode 1.1 to reflect comments from reviewers and 
alignment with ISO 10646. In early 1993 a technical report will be 
issued that defines the content of Unicode 1.1, including the complete 
revised mapping tables. The mapping tables will be available in soft 
form by anonymous FTP. The technical report will be sent to members of 
the Unicode Consortium (inc. associates & individuals); others may 
obtain copies or information about FTP by contacting:

    The Unicode Consortium
    1965 Charleston Road
    Mountain View, California 94043 USA

    E-mail: unicode-inc@hq.metaphor.com
    Phone: (415) 961-4189
    Fax:   (415) 966-1637


2. Final Zone Allocations

The following zone reallocations do not affect any allocated Unicode 1.0 
characters.

A. Unicode Allocation
Range               Cells   Name/Contents
U+0000 => U+4DFF    19,968  A-ZONE Alphabets, syllabaries, symbols
                            (the 65 control codes are excluded)
U+4E00 => U+9FFF    20,992  I-ZONE Ideographs
U+A000 => U+DFFF    16,384  O-ZONE Reserved for future assignment
U+E000 => U+FFFF     8,192  R-ZONE Restricted use
                            (FFFE & FFFF are excluded)
B. R-ZONE Allocation
Range               Cells   Name/Contents
U+E000 => U+F8FF     6,400  Private Use Area
                            (Corporate Use starts at F8FF)
U+F900 => U+FFEF     1,776  Compatibility Zone
                            (including presentation forms)
U+FFF0 => U+FFFF        16  Specials
                            (FFFE & FFFF are not character codes,
                            and are excluded)

3. Characters deleted or withdrawn for further study:

A. Groups of characters deleted
Range               Group Name
U+0E70 => U+0E74    Thai Phonetic Order Vowel signs
U+0EF0 => U+0EF4    Lao Phonetic Order Vowel signs
U+1000 => U+104C    Tibetan script

B. Individual characters deleted
U+03DB          $   GREEK SMALL LETTER STIGMA
U+03DD          $   GREEK SMALL LETTER DIGAMMA
U+03DF          $   GREEK SMALL LETTER KOPPA
U+03E1          $   GREEK SMALL LETTER SAMPI
U+2300          $   APL COMPOSE
U+2301          $   APL OUT

4. Characters unified

From    With    Image   Old Name
U+0371  U+0314  $   GREEK NON-SPACING DASIA PNEUMATA
U+0372  U+0313  $   GREEK NON-SPACING PSILI PNEUMATA
U+0384  U+030D  $   GREEK NON-SPACING TONOS
U+04C5  U+049A  $   CYRILLIC CAPITAL LETTER KA OGONEK
U+04C6  U+049B  $   CYRILLIC SMALL LETTER KA OGONEK
U+04C9  U+04B2  $   CYRILLIC CAPITAL LETTER KHA OGONEK
U+04CA  U+04B3  $   CYRILLIC SMALL LETTER KHA OGONEK
U+3004  U+4EDD  $   IDEOGRAPHIC DITTO MARK

5. Characters moved

From    To      Image   Old Name
U+0370  U+0345  $   GREEK NON-SPACING IOTA BELOW
U+0385  U+0344  $   GREEK NON-SPACING DIAERESIS TONOS
U+03D7  U+037E  $   GREEK QUESTION MARK
U+03D8  U+0374  $   GREEK UPPER NUMERAL SIGN
U+03D9  U+0375  $   GREEK LOWER NUMERAL SIGN
U+03F3  U+0384  $   GREEK SPACING TONOS
U+03F4  U+0385  $   GREEK SPACING DIAERESIS TONOS
U+03F5  U+037A  $   GREEK SPACING IOTA BELOW
U+05F5  U+FB1E  $   HEBREW POINT VARIKA 
U+32FF  U+3004  $   JAPANESE INDUSTRIAL STANDARD SYMBOL

6. Character blocks rearranged

The explicit list will be in Unicode 1.1.
Range               Group Name
U+32D0 => U+32FE    Circled Katakana: The 1.1 characters will be
                    arranged in modern order:
                    e.g., A, I, U, E, O, KA, KI, ...
U+FE80 => U+FEFC    Basic glyphs for Arabic language: The 1.1
                    character shapes will be arranged in different
                    order: Isolate, Final, Initial, Medial

7. Character semantics changed

A. Zero Width Joining
U+200C          $J  ZERO WIDTH NON-JOINER
U+200D          $J  ZERO WIDTH JOINER

In the merger with ISO 10646, the semantics of these two characters have 
been given a narrow interpretation. This brings added precision to the 
explanation given in Volume 1, page 77.

The intent of these characters is to address cursive graphical 
connection between the glyphs of a script, e.g. in scripts like Arabic 
whose printed form emulates handwriting. NON-JOINER and JOINER are best 
thought of as behaving like tiny letters that neighboring glyphs may 
connect to (JOINER) or avoid connecting to (NON-JOINER). They are thus 
processed as ordinary cursive letters rather than as control characters.
NON-JOINER and JOINER affect how the two neighboring glyphs connect to 
them, not to each other. As such, they have no direct relationship with 
ligature formation; in particular, JOINER does not in any way request 
that its two neighbors be ligatured to each other. Indeed, both NON-
JOINER and JOINER may break up ligatures by interrupting the character 
sequence required to form the ligature.

The precise relationship between cursive appearance and ligatured
appearance may differ from script to script, and therefore the precise
usage of these characters is script-dependent. In the case of Latin
typography, cursiveness (handwriting emulation) and ligaturing are
independent. Thus the text on Volume 1, page 77, may be clarified as
follows:

f + JOINER + i will not form the ligature fi. Instead, if cursive
versions of the f and i are available in the font, each will
independently connect to the JOINER on the appropriate side (having the
same appearance as f + i).

Usage of optional ligatures such as => is not controlled by any codes
within the Unicode standard, but is determined by protocols or resources
external to the text sequence.

As further illustration, let a hyphen stand for a cursive connection to
a preceeding or following letter. Then in a cursive Latin font we would
get the following results (with N standing for NON-JOINER and J for
JOINER).

Unicodes        Rendering
f i s h         f-  -i-  -s-  -h    (optionally using a fi- ligature)
f J i s h       f-  -i-  -s-  -h
f N i s h       f    i-  -s-  -h
f J N i s h     f-   i-  -s-  -h
f N J i s h     f   -i-  -s-  -h

With regard to the Arabic script, the statements in Volume 1, page 77,
remain correct. In Volume 2, page 390, Arabic rules L2 and L3, the
JOINER can be used to get the appearance in parentheses.

With regard to conjuncts in Indic scripts, the statements in Volume 1,
pp. 53-56, and Volume 2, pp. 399-414, remain correct. However for
clarity, in pp. 399-414 the term ligature should be replaced by the term
conjunct.

B. Byte Order Mark
U+FEFF          $J  ZERO WIDTH NO-BREAK SPACE

In addition to the meaning of BYTE ORDER MARK, as defined in Volume 1 of
the Unicode standard, the code value U+FEFF may now also be used as ZERO
WIDTH NO-BREAK SPACE (ZWNBSP). For convenience in discussion, it can
also be referred to by this name (which is the ISO 10646/Unicode 1.1
name for U+FEFF).

ZWNBSP behaves like a U+00A0 NO-BREAK SPACE in that it indicates the
absence of word boundaries; however, ZWNBSP has no width. For example,
this character can be inserted after the fourth character in the text
"base+delta" to indicate that there should be no line break between the
"e" and the "+" (for more information, see Volume 2, pp. 6-7).

8. Characters added

There are a large number of characters that will be added to Unicode 1.1
that will be included in the technical report, as explained above. These
will include the following characters, which were omitted from Unicode
1.0.

U+0A4D          $   GURMUKHI SIGN VIRAMA
U+0A8D          $   GUJARATI VOWEL CANDRA E
U+0A91          $   GUJARATI VOWEL CANDRA O
U+0AC9          $   GUJARATI VOWEL SIGN CANDRA O
U+0B56          $   ORIYA AI LENGTH MARK
U+25EF          $   LARGE CIRCLE
U+FFE8          $   HALFWIDTH FORMS LIGHT VERTICAL
U+FFE9          $   HALFWIDTH LEFTWARDS ARROW
U+FFEA          $   HALFWIDTH UPWARDS ARROW
U+FFEB          $   HALFWIDTH RIGHTWARDS ARROW
U+FFEC          $   HALFWIDTH DOWNWARDS ARROW
U+FFED          $   HALFWIDTH BLACK SQUARE
U+FFEE          $   HALFWIDTH WHITE CIRCLE

9. Character mapping changed

From    To      Image   XJIS    Name
U+00AD  U+2010  $   815D    JIS HYPHEN
U+20DD  U+25EF  $   81FC    JIS COMPOSITION CIRCLE




                       Volume 2 Errata

1. Page 6
Change in lines 26, 27: ... ZERO WIDTH SPACE can be used to indicate
word boundaries in scripts like Thai...

2. Page 19
The glyphs in Figures 2-14 and 2-15 were printed incorrectly.  The 4
correct glyphs are:
Figure      Image on Left   Image on Right
2-14        $               $
2-15        $               $

3. Pages 60,66,75,79,91,131,135,140,143,150,264,277,301,311,343
There are are number of glyphs which were printed incorrectly in various
places in Volume 2.  The most serious are:
Code        Image   Pages
U+71F7      $       60, 131, 264
U+773E      $       66, 135, 277
U+809C      $       75, 140, 301
U+8480      $       79, 143, 311
U+908E      $       91, 150, 343

4. Page 401
Change wording and rule in C3: ...The dead consonant RAd changes to a
non-spacing mark RAx when followed by a consonant cluster. The...
    RAn +   VIRAMAn =>  RAx

5. Page 403
Add L1a: The ZERO-WIDTH JOINER can be used to produce the so-called
eyelash-RA (RAh) used in Marathi. RAh is a spacing half-consonant which
is not subject to special ordering of RAx (O2).
    RAn +   ZWJ +   VIRAMAn =>  RAx

6. Page 404
Change O2 to:
    RAx  +  Cluster =>  Cluster  +  RAx
In processing a line of glyphs, this rule is not applied twice to the
same RAx.

7. Page 429
Line 7 has the period misplaced, and should read:
Visual: .KO ,bmw 500 A SI TI
