\documentstyle[11pt,a4]{article}
\begin{document}

\section {Report on the Character set converter obtained from DUKNET}
\subsection {What is available}

\bigskip\noindent
\subsubsection {Supported character sets}
\medskip\noindent

\begin {tabular} {l l l l}
ANSI\_X3.4-1968 & BS\_4730 & cp437 & cp850 \\
cp860 & cp863 & cp865 & DEC-MCS \\
DIN\_66003 & dk-us & DS\_2089 & EBCDIC-AT-DL \\
EBCDIC-AT-DL-A & EBCDIC-BE & EBCDIC-BR & EBCDIC-CA-FR \\
EBCDIC-DK-NO & EBCDIC-DK-NO-A & EBCDIC-ES & EBCDIC-ES-A \\
EBCDIC-ES-S & EBCDIC-FI-SE & EBCDIC-FI-SE-A & EBCDIC-FR \\
EBCDIC-INT & EBCDIC-IT & EBCDIC-JP-E & EBCDIC-PT \\
EBCDIC-UK & EBCDIC-US & ES & ES2 \\
FI & GB\_1988-80 & ISO\_646.basic & ISO\_646.irv \\
ISO\_6937-2 & ISO\_8859-1 & ISO\_8859-2 & ISO\_8859-3 \\
ISO\_8859-4 & ISO\_8859-5 & ISO\_8859-6 & ISO\_8859-7 \\
ISO\_8859-8 & ISO\_8859-9 & ISO\_8859-supp & IT \\
JIS\_C\_6220 & JIS\_C\_6229 & JUS\_I.B1.002 & latin-lap \\
latin6 & macintosh & MSZ\_7795.3 & NF\_Z\_62-010 \\
NS\_4551-1 & NS\_4551-2 & PT & PT2 \\
roman8 & SEN\_850200\_B & SEN\_850200\_C & us-dk \\
\end{tabular}


\bigskip\bigskip\bigskip
\noindent\subsubsection {Information Files that are available} 

\begin {itemize}
\item CHARSETS - (39K) specifies the above character mnemonic sets. These
sets must hold values which are validly defined in the {\em superset} 
consisting of the 3 files specified below.

\item ISO\_10646 - (32K) specifies the character mnemonics according to 
ISO /10646. However only the alphabetic characters like latin, greek, 
cyrillian, hebrew and arabic are included. 

\item CONTROL  - (2K) specifies the control character mnemonics as defined in 
ISO 2047 and ISO 6429-1988.

\item OTHER    - (1K) specifies the character mnemonics used privately.
The mnemonics held here are those specified:

\begin {description}
\item In ISO\_6937-2 (Annex B).
\item To cover IBM symbols (e.g the Dutch gilder sign)
\item To cover HP ROMAN8 symbols (Italian lira sign)
\item To cover the Macintosh symbols.
\end {description}
\end {itemize}

\clearpage
\noindent\subsubsection {Progs that are available} 

\medskip\noindent prog {\em GC}

\begin {itemize} 

\item Reads ISO\_10646, CONTROL and OTHER to produce a superset 
file called CHARMNEM containing all the valid mnemonics.

\item Reads CHARSETS, and creates for each specified
character set, a unique file which contains:

\begin {description}
\item A super matrix where each mnemonic held is placed as a numeric 
2 byte value obtained from the superset.

\item A super matrix where each mnemonic held is placed into its rightful
position.
\end {description}

\item Produces a file CHARMAP.10646 to be used by POSIX.2
\end {itemize}

\bigskip\noindent prog {\em CONV}

\begin {description}
\item CONV  character-set-in  character-set-out [ < file-in  > file-out]
\end {description}

\begin {itemize}
\item Reads one character set values and converts to another character 
set. Reads from stdin, Outputs to stdout. If a character cannot be mapped
then it is output as Esc followed by its 2 byte mnemonic. An Esc followed 
by an Esc is taken to mean one Esc.
\end {itemize} 


\bigskip\bigskip
\subsection {Future developments}

\begin {itemize}
\item The code will be PC compatible and as DOS is unable to create 
links, all the files generated by program {\em GC} will be concatenated 
into one large file.

\item All files will have names not greater than 8 characters in length 
again to be compatible with PCs.

\item a quick table lookup having the superset already described above + 
a longer table lookup of variable length names. The variable mnemonics 
will be prefixed with Esc followed by \_ and postfixed with \_ 

\item Hexadecimal representations for the ISO 10646 codes. 

\item Will be able to replace two symbols to one abstract symbol, by the use 
of POSIX locales. This should cope with the T.61 non-spacing accents, 
and replacing say a IA5 FF to T.61 FF,CR.

\end {itemize}


\clearpage 
\subsection {Current Modifications for PP compatibility}

\begin {itemize}
\item More stringent checking on the opening reading and writing of files. 
GC core dumps of if cannot open CHARDEFS.
\item Separation into code, manual CHARSETS, and generated CHARSETS 
should be configurable.
\item Add PP RCSing.
\item Replace constants with #define.
\item Replace the "nickname" keyword to "alias" this 
conforms with PP's keywords.
\item Replace "unsigned char" and "short int" with typedef statements. - so 
code is more independent and is run correctly at the discretion of the 
administrator.
\end {itemize}


\clearpage
\subsection {Future PP Requirements}

\begin {itemize}
\item A general mapping program that can map any character set into any
other. (This requirement is satisfied).
\item To decide whether all the different character set mappings should
be put into 1 huge file ?. How to break down the mnemonic superset into 
smaller ones so that only certain character sets can be mapped by the 
breakdown ?. 
\item Decision whether or not the fixed and variable character mnemonics 
table lookup should be intermingled. (I do not think so).
\item Decision whether or not to put the core of the code into the 
library so that other PP routines can perform conversions.
\item Seperation of the master information and generated mapping files 
from the source code and to decide into which directory to place this.
\item Generation of a warning message about the source code 
being m/c specific so that it is in keeping with the ISO 16 bit matrix.
\item Decision on whats to be done about the 1 symbol mapping into 2. 
E.g In CCITT X.408 FF (Form Feed) is mapped into "CR,FF" in the 
T.61 string. (I do not think Keld's code is able to cope with this.)
\end {itemize}

\end {document}
