The major problem of course is that if you have used the same name for different things in your source code, or you decide to vahunz a name you should not, it is easy to end up with vahunzed files that do not even compile. This program is not idiot proof, because this would required to implement huge parts of those things full compilers do.
When removing the indention, there are still some blanks or empty lines left. This mostly has to do with replacing comments by one blank, as the ANSI-specifications requires this, and the parser does not remember if there already was a blank before the comment started. But the whole feature is of minor importance anyway, as there are numerous tools around to restore indention. It is mostly useful as a compression algorithm.
When you are doing very funky stuff with the C-preprocessor,
it might not work any more after vahunzation. This especially refers
to ##
and such things. In my opinion, this is a bug in
the design of the preprocessor, and a lack of brain if a programmer
really uses such things.
For vahunzing Java and C++sources, many keywords like
implements
or operator
are not recognized
yet. You will have to specify them to be ignored instead, as the
internal keyword lists are incomplete at the moment, as I currently
can't (Jave) respectively don't want (C++) to use these languages
myself. However, if you send me a list of these keywords, I will
include them in the next release.
You must not modify files during vahunzation. For example, you can change a source after the first pass, when all names have already been retrieved. If now an unknown name shows up in the second pass, Vahunz can no more ensure that all garbled name will not be the same as the new unknown word, and will abort with an error.
If carriage returns (\r
) show up in the input,
most of them will be gone in the output. If you use this tool on
source files coming from Macintosh (with \r
as line
separator) or something worse (with \r\n
), better use a
cr/lf-converter before vahunzing them.
There is a limited line and name length when reading the lists for names and files. Currently this is 1023, which should be sufficient to read most identifiers and filenames. If a line is too long, the program aborts. C-strings suck.
The number of names and files is only limited by the available memory. The source files do not have any limitations concerning the line length or whole file size, as the are read character-by-character.
Stack usage should be very small, as most buffers are allocated dynamically, and the tree functions are none-recursive. Probably even the standard 4K are sufficient. Therefore, the program is compiled without stack check.