The Truth About Data Compression
by Pablo Biannuci

Most of the people using personal computers have been amazed by the
prodigy of data compression.  It's very difficult to imagine how that
long, long file filled with nonsense numbers could get squeezed into
a small, small file filled with nonsense numbers.  In this article
I'm going to explain all the secrets about the compression of
computer data, even those that the software companies do not want to
see published.  There are several data compression algorithms.  An
explanation of each follows, along with a brief summary of its
applications. 

Non-Repeat Packing
---------- -------
This is the simplest (and worst) compression method.  It is based on
the fact that if you write one 'A' instead of twenty, you'll save a
lot of space.  Actually, its savings aren't too impressive, but it is
known to have saved a life when a message saying "I'll kick yer butt"
was compressed to "I'l kick yer, but" (There is a committee
investigating the sudden appearance of the comma, but they don't
agree that a comma is there yet.)

Huffman Tree        
------- ----
This is one of the first algorithms that yielded a relatively
acceptable compression ratio. (relatively means E=mc^2, where E is 
the compressed file size, m is the original file size and c is the 
speed of light).  

It is quite simple; all you have to do is to count the appearances
of each character (A, B, C, D, and so on) and then build a binary 
tree with them so that the characters will be its leaves.  Once the
tree is built, just get a sharp axe and cut it down.  Chop it in 
small fragments, and pile them neatly.  It will occupy just a
fraction of the space it occupied before, so it will be compressed.
Currently the Huffman algorithm is not so frequently used because 
there must be some kind of identification in each piece of tree, and
that takes so much space that it's almost bigger than the tree.  MNP
modems use a modified version of this method, but instead of chopping
the tree with an axe they chop it with a MicroCom Chop-O-Matic
machine, which is much faster. 

LZW
---
LZW is a more modern algorithm.  It is widely used, sometimes 
together with Huffman.  Its name (kinda cryptic, isn't it?) derives 
from its authors initials: Lempel, Ziv and Welch.

The idea behind this method is revolutionary: To reference part of 
the contents to things before it. (It is revolutionary in data 
compression, it has been widely used in most other life aspects.)
To be more clear to those of you not accustomed to technical jargon,
what this algorithm does is to insert footnotes instead of the actual
data. Let's see an example.  The original text is : "As I was looking
at my reflection in the mirror, I was playing reflections with my
look, and I broke the mirror."  (Just a selected text sample.  It's
not a reflection of the author's mental state.  In fact, mine is a
bit worse.)  The compressed text would be: "As *1* *3* my *2* in the
*4*, *1* playing *2* with my *3* and *1* broke the *4*."

Footnotes: "1:I (was) / 2:reflection(s) / 3:look(ing) / 4:mirror.
As everybody can see, it is tightly compressed, and now it fits into
a pocket book's page.

Shannon-Fano Trees
------------ -----
This is another vegetal algorithm.  The difference is that this
method uses the so-called "Sliding Dictionaries," which make it
better.  To compress data this way, you have to take it, dig some
holes in fertile ground and scatter parts of it (the data) in the
holes.  After some nice trees have grown up (the Shannon-Fano trees,
and they grow quite fast), you tie some ropes to their branches and
place a dictionary so that it is able to slide up and down the rope.
(Be sure the rope is strong enough, or don't use an unabridged
Webster's dictionary.)  I cannot figure out why this compresses the
data yet, but it works, so I'll leave it alone for now.
                
There are more compression methods out in the computer world, but I 
didn't have access to the confidential information about them.
(Lunch time in the software company that uses it is an example.) 
Also, some methods aren't single algorithms, but a mixture of two or 
more of them.  For example, the world-wide known Imploding is a LZW
algorithm with a Shannon-Fano Tree performed after, which leads to 
a data collapse and posterior implosion, but I won't be going in much
farther detail. 

So here ends our lesson about data compression.  I *1* this *6*
*7* *2* useful *8* you *2* it *6* *7* *8* me, *2* a way *5* spend 
my time at *3* *4* hospital.  Hey!  Who *6* turned on *3* LZW?

1:hope / 2:as / 3:the / 4:psychiatric / 5:to / 6:has / 7:been / 8:for
--------------
Pablo Biannuci is the sysop of Atomic World BBS in Avellaneda, Buenos
Aires, Argentina.  FidoNet> 4:901/225
