Filtering Data

Translate bad characters into good characters with Xlate.EXE

By Ken Getz and Chuck Litzell

Xlate.EXE belongs to a class of programs called filters. Filter
programs have an input and an output. The input data is fed into
the program, which filters or translates it. The result is the
output data.

Xlate is a character translator. It produces a transformed file
by replacing every character in the input file with a character
from a translation table. To use Xlate, you will need to be
familiar with the ASCII standard code for representing characters
in a computer and with IBM's extensions to that chart.

The purpose of Xlate is to remap the bytes in a file. There are
many reasons for wanting to do this. The original purpose of the
program was to replace characters in the data area of a damaged
dBASE database file with characters that would permit dBASE to
regain access to the data.

You can use Xlate to produce a file containing only printable
characters from any file, including, if you want, .COM and .EXE
files. The more practical uses for Xlate will involve repairing
or preparing files produced by word-processing programs,
languages such as BASIC or COBOL, and other programs that store
primarily text data in files.

Using Xlate

You must provide Xlate with the name of the input file, the name
of the file to be created, and a translation table to map
characters between the two files. There are two input screens. On
the first, you will enter the two filenames and begin the
preparation of the translation table.

The second screen displays the translation table, and you can
alter any of its 256 cells. Each cell holds a value between -1
and 255. For example, if Xlate finds the letter A (decimal 65) in
the input file, it will look in the 65th position of the
translation table, and substitute the value it finds there in the
output file. If you wanted to change all occurrences of uppercase
A to lowercase a you would enter the ASCII code for the lowercase
a (97) in the 65th position of the table. You will want to have
an ASCII chart available when you use Xlate. 

When a position in the translation table contains a -1, Xlate
will not place a substitute character in the output file. This
makes it possible to create an output file with the specified
characters removed.

Xlate provides several options that simplify the process of
setting up the translation table for common file formats. By
selecting options, you can have Xlate set appropriate values in
the translation table. You can then customize the resulting table
on the second screen.

Syntax options

Xlate is run from the DOS prompt. The command syntax is:

Xlate [-m] [<inputfile> [<outputfile>]]

The square brackets indicate optional parameters that can be
passed when you run Xlate from the DOS prompt.

The first option, [-m], tells Xlate to use only colors available
in monochrome mode. Xlate senses the presence of a monochrome
adaptor, color graphics adaptor, or enhanced graphics adaptor. If
the CGA or EGA is present, the program will use color. If your
computer has a color graphics adaptor and composite monochrome
monitor, you can improve the display contrast by using the [-m]
option.

The second and third options are the names of the input and
output files. Entering the filenames on the command line makes it
possible to run Xlate from a batch file or another program with
filenames supplied by the program. Xlate does, however, require
user input.

Once you have entered the command at the DOS prompt, Xlate will
identify itself and request that you press a key to begin.

When the program begins, you will first see the menu screen
(Figure 1). The cursor is positioned at the input file: field,
and you can enter or edit the input filename. If you enter the
name of a file that does not exist, a message is displayed on the
status line and you must reenter the filename.

Next, you will enter the name of the file to be created. If you
enter the name of an existing file, Xlate asks whether it is OK
to overwrite the file. Answer YES by pressing Y, or NO by
pressing N.

Setting the Options

After the filenames have been entered, you can alter any of the
options on the lower half of the screen. The first five options
are YES/NO options. Select the option you want to change with
Uparrow and Downarrow, then select YES or NO by typing Y or N or
highlighting your choice with Rightarrow or Leftarrow.

The last three options have multiple values. Select the line you
want to change, and press Rightarrow. This will open up a box
that shows the possible settings for that option. Select the one
you want with Uparrow and Downarrow and press Return. Two of the
suboptions require that you also enter a number to be entered in
the translation table for a range of characters.

Most of the options set ranges of characters in the translation
table to the values required for repairing dBASE III PLUS .DBF
files, or filtering non-printable characters from text files.

Preserve Carriage Returns? / Preserve Linefeeds?

Set these options to YES if you don't want Xlate to translate
carriage returns and linefeeds to another character. These
options are provided since carriage returns and linefeeds are the
only control characters normally found in plain text files. To
generate a plain text file, set these options to YES, and set
Change characters below 32 to Delete.

Convert EOF markers to spaces?

End-of-file characters in the data area of a dBASE database file
or a text file prevent access to the characters that follow them.
Set this option to YES to have Xlate translate internal
end-of-file characters to spaces.

Convert NULL characters to spaces?

NULL characters (ASCII 0) are especially disruptive if they get
into the data area of a dBASE database file. When this option is
set to YES, Xlate will replace NULLS with spaces.

Preserve dBASE file header?

If Xlate alters bytes in the file header of a dBASE .DBF file,
the file will not be accessible by dBASE. Set this option to YES
if you are translating a database file. Xlate will pass the
header through unaltered and will translate bytes in the data
area of the file only.

Note: When processing a dBASE file, you should never select an
option that deletes any characters. If you do, the data in the
resulting file will be skewed.

Change characters below 32?

Characters with ASCII codes below 32 are called control codes.
With the exception of the carriage return (13), linefeed (10),
and tab (9), they are generally not found in text files and
should never appear in the data area of a database. This option
has three settings. You can leave control codes unaltered,
replace them all with a character you specify, or delete them.

Change characters above 127?

Characters above 127 are outside of the printable character
range. If you are creating a text file, you will want to handle
these characters in some way. This option provides four ways of
handling characters above 127. You can leave them as they are,
delete them, replace them with a character you specify, or strip
the high bit. The last of these methods brings the character into
the printable range by subtracting 128 from it. This option is
helpful when converting files created by word processors such as
WordStar.

Handle EOF markers at EOF?

This option tells Xlate how to end the output file. Text files
and database files normally end with an end-of-file marker. You
can set this option to add an EOF character if one does not
exist, to remove one if it does exist, or to take no special
actions at end-of-file.

Customizing the translation table

When you are finished with the first screen, press F10 to move to
the second screen. The translation table (Figure 2) is displayed
in a 16-by-16 matrix of cells. The lowest cell, 0, is in the
upper left corner of the matrix and 255, the highest cell, is in
the lower right corner. Each cell holds the ASCII value of the
character that will replace every occurrence of the character in
the input file.

The row and column indexes surrounding the matrix will help you
to find the cell you want to update. For example, to change all
occurrences of F to P:

1. Using your ASCII chart, find the decimal value for F. The
ASCII code for F is 70.

2. Find the highest row in the matrix that has a row index less
than or equal to 70. That's row 64.

3. Calculate the difference between the target cell (70) and the
row index (64). The difference is 6.

4. Staying in row 64, move to the right until you are in column
6.

5. Enter the code for P (80) from your ASCII chart.

The table wraps at the edges; that is, if you press Uparrow in
the first row, the cell in the same column of the bottom row is
selected.

Press F10 when you are ready to execute the translation. Press
Esc to return to the first screen. Note that Xlate always
rebuilds the translation table according to the option settings
when you move from the first screen to the second. If you change
individual cells in the table, back up to the first screen and
then return to the translation table; your changes will have been
undone.

Executing the translation

When you press F10, Xlate begins the translation. As the files
are processed, a progress report is displayed on the status line.
The number to the left of the slash is the number of kilobytes
(1,024 characters per kilobyte) that have been read and
translated. The number to the right of the slash is the total
number of kilobytes in the input file, rounded up to the nearest
kilobyte. Note that if you entered -1 in any cell of the
translation table, the output file may be smaller than the input
file because some bytes will not be translated.

You can terminate the translation early by pressing the Esc key.
Xlate will display a message and return you to the translation
table where you can make further changes and reexecute the
translation. To quit without completing the translation, press
Esc at the translation table to back up to the first screen, then
press Esc again.

When Xlate has completed a translation, you are returned to the
first screen, where you can begin a new translation, or press Esc
to quit.

Error Messages

If any errors occur during a translation, Xlate will display a
message and return you to the translation table screen. Since the
filenames are verified when you enter them, file access errors
are unlikely. Figure 3 shows Xlate error messages.

Cleaning up damaged .DBF files

One of the most common forms of database corruption occurs when
an end-of-file character appears in the data area of a .DBF file.
This is usually the result of a circumstance that prevented dBASE
(or DOS) from writing newly appended records to the end of the
database. These are most common:

o Power failure occurred, or the power was shut off before new
records could be written to the disk.

o Diskette was removed before the database was closed.

o Hardware disk error occurred, or DOS's file allocation tables
have been damaged.

There are many other possible causes, almost all of which result
in an ungraceful exit from dBASE. When an end-of-file character
appears in the middle of a .DBF file, you are likely to see the
message, "End of File Encountered Unexpectedly."

Any character with an ASCII value lower than 32 (space character)
is inappropriate in the data area of a dBASE database file. NULL
characters (ASCII 0), for example, cause portions of records to
appear to be missing on the screen. Carriage returns and
linefeeds will cause display distortion. Other characters may
cause the bell to ring or may corrupt index files.

The following option settings will produce a translation table
suitable for recovering a damaged .DBF file:

Option                             Setting

Preserve Carriage Returns?         NO

Preserve Linefeeds?                NO

Convert EOF markers to spaces?     YES

Convert NULL characters to spaces? YES

Preserve dBASE file header?        YES

Change characters below 32?        Replace with 32

Change characters above 127?       Replace with 32

Handle EOF markers at EOF?         Add EOF marker

Removing control characters from .PRG files

dBASE .PRG files are plain ASCII text files. They are expected to
contain characters in the range 32 to 126, carriage returns,
linefeeds, or tab characters. The file should be terminated with
an EOF character.

.PRG files can be created with dBASE's built-in editor, or you
can use an external text editor, such as EDLIN. Most
word-processing programs can read and write plain text files, so
they can be used to write dBASE programs. Sometimes, however,
they will leave behind non-ASCII characters.

For example, WordStar can be used in non-document mode to edit
plain text files. But in document mode, WordStar reformats lines,
inserting special control characters, and setting the high bits
on the last byte of each word in the text. Even when you're
editing a program in non-document mode, certain combinations of
keys can cause WordStar to leave its special control characters
behind. When dBASE attempts to interpret a program that has
nonprintable characters, you will see messages such as "Syntax
Error," "Unrecognized Phrase/Keyword in Command," "Variable Not
Found," or just about any other message.

To build a translation table that will take care of problems like
these, set Xlate options like this:

Option                            Setting

Preserve Carriage Returns?         YES

Preserve Linefeeds?                YES

Convert EOF markers to spaces?     YES

Convert NULL characters to spaces? YES

Preserve dBASE file header?        NO

Change characters below 32?        Delete

Change characters above 127?       Strip High Bit

Handle EOF markers at EOF?         Add

Making a text file out of any file

Xlate can create a text file from any file, including files such
as .EXE or .COM files that contain little or no text. All you
need is a translation table that filters out all non-printable
characters.

Option                             Setting

Preserve Carriage Returns?         YES

Preserve Linefeeds?                YES

Convert EOF markers to spaces?     YES

Convert NULL characters to spaces? YES

Preserve dBASE file header?        NO

Change characters below 32?        Delete

Change characters above 127?       Delete

Handle EOF markers at EOF?         Add

Credits: Xlate is based on Strip.COM by David Howlett. Xlate is
written in C and compiled with the Borland International Turbo C
compiler. The Blaise C Tools Library is used extensively.
