Known Problems

The major problem of course is that if you have used the same name for different things in your source code, or you decide to vahunz a name you should not, it is easy to end up with vahunzed files that do not even compile. This program is not idiot proof, because this would required to implement huge parts of those things full compilers do.

When removing the indention, there are still some blanks or empty lines left. This mostly has to do with replacing comments by one blank, as the ANSI-specifications requires this, and the parser does not remember if there already was a blank before the comment started. But the whole feature is of minor importance anyway, as there are numerous tools around to restore indention. It is mostly useful as a compression algorithm.

When you are doing very funky stuff with the C-preprocessor, it might not work any more after vahunzation. This especially refers to ## and such things. In my opinion, this is a bug in the design of the preprocessor, and a lack of brain if a programmer really uses such things.

For vahunzing Java and C++sources, many keywords like implements or operator are not recognized yet. You will have to specify them to be ignored instead, as the internal keyword lists are incomplete at the moment, as I currently can't (Jave) respectively don't want (C++) to use these languages myself. However, if you send me a list of these keywords, I will include them in the next release.

You must not modify files during vahunzation. For example, you can change a source after the first pass, when all names have already been retrieved. If now an unknown name shows up in the second pass, Vahunz can no more ensure that all garbled name will not be the same as the new unknown word, and will abort with an error.

If carriage returns (\r) show up in the input, most of them will be gone in the output. If you use this tool on source files coming from Macintosh (with \r as line separator) or something worse (with \r\n), better use a cr/lf-converter before vahunzing them.

There is a limited line and name length when reading the lists for names and files. Currently this is 1023, which should be sufficient to read most identifiers and filenames. If a line is too long, the program aborts. C-strings suck.

The number of names and files is only limited by the available memory. The source files do not have any limitations concerning the line length or whole file size, as the are read character-by-character.

Stack usage should be very small, as most buffers are allocated dynamically, and the tree functions are none-recursive. Probably even the standard 4K are sufficient. Therefore, the program is compiled without stack check.