1
                                                        22. Feb. 89 -Nl

                                     Karl-L. Noell <NOELL@DWIFH1.BITNET>


       Short Description of the UUE- and BOO-Encoding Scheme
      =======================================================


 Executable modules (*.EXE, *.COM) and archives (*.ARC) are always
 binary files.  The smallest item is a group of eight consecutive
 bits (called 1 byte) with a coding range (00h .. FFh).

 To transfer such binary formatted files, a one-to-one correspondence
 between each byte (transmitted -> received) is necessary;  this is
 usually called "transparent transmission".

 Today's computer systems are of inhomogeneous architecture, there are
 various bit groupings (words), e.g. 16 bits 32 bits, 48 bits and there
 are different byte sequences (low byte first versus high byte first).

 Sometimes, such binary files are shipped over networks which again are
 of various architecture, linked by gateways, bridges and mail trans-
 fer agents.  Gateways and native mailing systems are provided only for
 text information, i.e. files with only printable characters and a few
 control characters.  Therefore it's often not possible to transfer a
 binary file "as-is".

 A very simple solution could be a mapping into hexadecimal codes:
 each byte regardless of its particular meaning, can be presented in a
 unique way by two hexadecimal digits.  This way, all bytes are mapped
 to the always printable character set  0,1, ... ,9,A,B,C,D,E,F .
 Obviously there is a drawback, because converting a binary file into
 its hexa presentation blows up the file size by 100 percent.

 To prevent doubling the file size, more sophisticated encoding schemes
 have been developed.  Frequently used are BOO-encoding (bootstrap-able
 files for Kermit distribution) and UUE-ncoding (Unix-to-Unix).  These
 schemes are based on mapping three bytes into four printable characters,
 which avoids doubling the file sizes, but they grow by at least 3:4 .

 An example for this algorithm is given by the following sketches:

 three consecutive bytes in a binary file, e.g.  E7  D6  52  are
 grouped together, and they are subdivided into four groups with
 six bits each:

                   I      E7       I      D6       I      52       I
                ---I---------------I---------------I---------------I---
 (3x 8 bits)   ... I1 1 1 0 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 0 1 0I ..
 (00 .. FFh)    ---I---------------I---------------I---------------I---
                   I                                               I
                   I    39     I    3D     I    31     I    12     I
                ---I-----------I-----------I-----------I-----------I---
 (4x 6 bits)   ... I1 1 1 0 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 0 1 0I ..
 (00 .. 3Fh)    ---I-----------I-----------I-----------I-----------I---

1

 The six-bit pieces (sometimes called nibbles or chunks) will cover
 a coding range of (00 .. 3Fh) which again does not ensure codes
 corresponding to printable characters only.  Therefore a conversion
 is necessary which maps these 6-bit codes to a table with 64 printable
 characters.

 Let us see, how this conversion is performed by three different
 encoding schemes (UUE, XXE and BOO).



 UUE:
 ======

 A 20h offset is added to each six-bit nibble shifting them to the
 range (20h .. 5Fh):


    I      59       I      5D       I      51       I      32       I
 ---I---------------I---------------I---------------I---------------I---
 .. I0 1 0 1 1 0 0 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 1 0I ..
 ---I---------------I---------------I---------------I---------------I---


 which yields 8-bit character codes, always printable under an EBCDIC
 and under an ASCII alphabet as well.  The ASCII characters in an UUE-
 file can be:

  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_

 *This* character set should be e-mailable without problems.  But that's
 theory.  Some characters suffer a redefinement in national alphabets,
 fortunately this can easily be adjusted by Kermit translation tables.

 As the character blank is valid within an UUE-file, there might be a
 particular problem.  UUE-files have records of fixed length but blanks
 may appear on the end of UUE-lines as well:

    begin 644 uudecode.com
    MZ7DLD)#-JT-O<'ER:6=H=" H0RD@,3DX-2!"3U),04Y$($EN8P($ +%7 #PS
    M                                                     !-#;VQO
    M<B!D:7-P;&%Y(#@P>#(U95 9 0/_#P<'< \'!W .!P=/+HHG"N3Y= Y#+HH'
    M4.C8"%C^S'7S^,/> /^E_@#PQP82 &X +L8&E $ OB  )HL$+J.5 2:+1 (N
    M$@##B\.+R.,%Z , XOO#48L.$@#B_EG#5;0/S1!=.@8& '0&H 8 Z:$ 5;@
    M!HH^" "+#@0 +HL6:@'^SO[*S1"T HL6!  R_\T07<-345)5Z$$ M :P 8H^
    M" "*[HH.!  NBQ9J ?[._LHZ[G4",L#-$%U:65O#4U%25>@6 +0'Z]-0H $
    MB3Z* 8D^(@",!HP!C 8D #/ Q#XB +D$ /SSJ__C@#Z2 0"P_W4*M '-%K
    M= +^R"4! ,(! *"2 <8&D@$ "L!U(C+DS18*P'4,B":2 ; ;"N1U$+ #@#Z4
    M#<-3"&P(GP@]"44)30F?"&P(___! /__@@#__T, ___$ /__Q0#__\$
    M      #__\$           #/4U%25U8RY%#_%CH!7E]:65O#4U%25U9,_Q8X
    M >ON58OLAUX"+HH'0PK = 7HT?_K\X=> EW#Z.7_#0H PSQA<@8\>G<"+"##
    ._ND  #/ Z +5     !H

    end
 -------------------------------------------------
1
 Several mailing systems and editors have some intelligence saying, that
 trailing blanks on the end of text lines are useless.  So it might
 happen, that the physical end of line gets advanced towards the last
 non blank character.  Similar things can happen with consecutive blanks
 in a line, sometimes they will be replaced by tabs.

 To prevent such severe problems, most UUE-ncoding procedures substitute
 each blank by an accent character (`) which preserves the meaning of
 blanks as valid UUE characters.  Let us say:  an accent character acts
 as a placeholder to prevent swallowing of blanks.

    begin 644 uudecode.com
    MZ7DLD)#-JT-O<'ER:6=H="`H0RD@,3DX-2!"3U),04Y$($EN8P($`+%7`#PS
    M`````````````````````````````````````````````````````!-#;VQO
    M<B!D:7-P;&%Y(#@P>#(U95`9`0/_#P<'<`\'!W`.!P=/+HHG"N3Y=`Y#+HH'
    M4.C8"%C^S'7S^,/>`/^E_@#PQP82`&X`+L8&E`$`OB``)HL$+J.5`2:+1`(N
    M$@##B\.+R.,%Z`,`XOO#48L.$@#B_EG#5;0/S1!=.@8&`'0&H`8`Z:$`5;@`
    M!HH^"`"+#@0`+HL6:@'^SO[*S1"T`HL6!``R_\T07<-345)5Z$$`M`:P`8H^
    M"`"*[HH.!``NBQ9J`?[._LHZ[G4",L#-$%U:65O#4U%25>@6`+0'Z]-0H`$`
    MB3Z*`8D^(@",!HP!C`8D`#/`Q#XB`+D$`/SSJ__C@#Z2`0"P_W4*M`'-%K``
    M=`+^R"4!`,(!`*"2`<8&D@$`"L!U(C+DS18*P'4,B":2`;`;"N1U$+`#@#Z4
    M#<-3"&P(GP@]"44)30F?"&P(___!`/__@@#__T,`___$`/__Q0#__\$`````
    M``````#__\$```````````#/4U%25U8RY%#_%CH!7E]:65O#4U%25U9,_Q8X
    M`>ON58OLAUX"+HH'0PK`=`7HT?_K\X=>`EW#Z.7_#0H`PSQA<@8\>G<"+"##
    ._ND``#/`Z`+5`````!H`
    `
    end
 -------------------------------------------------

 Each UUE-line starts with a counter stating the actual number of bytes
 in this line:

 'M' is ASCII 4Dh, minus offset 20h yields 2Dh = 45 bytes per UUE-line,
 coded into 60 UUE-characters.

 The last UUE-line in the example given above starts with  '.' (2Eh)
 which tells us, that only 14 bytes (2Eh - 20h = 0Eh) are remaining,
 though there are 20 UUE characters in this line representing 15 bytes.

 These byte counters are essential, because the sizes of binary files
 are not necessarily modulo 3 and one or two bogus bytes would be
 appended otherwise by the decoding process.



 XXE:
 ======

 Instead of adding an offset, a table-look-up algorithm is used to get
 these printable characters:

 +-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

 Here are only TWO special characters  '+' and  '-' which will avoid
 inconsistencies with national character sets.
 In contrary:  UUE has 28 special characters and BOO has 14 such ones.

 The XXE scheme has been developed by Phil Howard, modified by
 David J. Camp in July 1988.
1

 BOO:
 ======

 This scheme has been developed in July 1984 by Bill Catchings,
 Columbia University, mainly provided for Kermit distribution.

 A 30h offset is added to each six-bit nibble shifting them to the
 range (30h .. 6Fh), which leads to this printable character set:

 0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno~

 The tilde (~) character has a special meaning in the BOO scheme:
 it prefixes a repetition factor to compress multiple successive
 zeros (00h).  Executable modules have often such sequences because
 of initalized data arrays.

 Let us close with an example to see how the efficiency regarding the
 file sizes gets affected by the different schemes:



                   MSVIBM.EXE     MSVIBM.UUE     MSVIBM.BOO
   size (bytes):     102130         143016         117308


 and after archiving the MSVIBM.EXE into a (binary) MSKERM.ARC:

                   MSKERM.ARC     MSKERM.UUE     MSKERM.BOO
   size (bytes):      70007          98042          95800
