As already stated, the way Vahunz works is not perfect, but a trade-off between efficiency, flexibility and programming effort. Here are some comments on other applications that perform related tasks.
Strip tools are used to remove comments and unused white space from a source code. Especially for idiotic languages like C and C++ the were quite popular some time ago, as stripped include files could speed up the compilation and reduce the amount of disk space required to store them.
Nowadays, disk space does not matter any more and most compilers for idiotic languages support an even more idiotic feature commonly called something like "precompiled headerfiles". Therefor such utilities are not so widely used anymore.
One such tool can be found in aminet:dev/c/stripc.lha.
It might make sense to use them together with Vahunz, as they
usually know more details about a certain language. Take a look at the
following excerpt of a C source code (a period (.
) represents a
blank):
struct.person { ..unsigned.char.name.[20]; ..int...........age; };When Vahunz would remove white space, it would turn into
struct.person { unsigned.char.name.[20]; int.age; };
But a good strip tool knows that it can also remove every kind of
white space before and after a bracket or semicolon (;
), and therefor
would generate:
struct.person{unsigned.char.name[20];int.age;};
So basically, every source code can be stored in one single line.
That's of course is not totally true for idiotic languages like C and C++, because the have an inferior concept called "preprocessor", which detects commands like
#include "hugo/sepp.h"
only at the beginning of a line.
At least you should now understand why there is some room for improvements left in a Vahunz-stripped output. It it should also be clear why such improvements will not happen because the whole issue is sick, when it comes to C and C++.
Indent tools are popular to keep a consistent indention style within your source code. A well-known example for such a program is GNUindent, available from aminet:dev/c/indent191.lha. Usually, they simply add white space and linefeeds to the source code to make it more legible.
In context with Vahunz they are only mostly for Unvahunzing. However, many of then are quite flexible, so you can configure them to glue lines together, something clumsy strip tools often do not. See the example code excerpt from strip tools.
Ideally, only one application should be responsible for vahunzation: the compiler itself. The reasons are obvious:
Furthermore, the project management environment already knows about all files that should be vahunzed, and can easily tell the compiler about that.
(There is the problem that many people still believe that compiler and project management are two completely different beasts and that crappy Makefiles are a great thing. But those people never have heard of Modula-2 or Oberon, so don't listen to them.)
An more integrated approach could also address the problem that possibly meaningful filenames are preserved.
All these things would simplify the whole process a lot.
As an addition, not only the source code can be vahunzed, but also the binary. Especially with "semi-interpreters" like Java, the compiled binary still contains a lot of information about the original source code because names of methods and attributes have to be stored to allow (possibly distributed) dynamic binding and other stuff during run-time. Therefor it is a pretty simple task to translate a *.class into a quite legible *.java.
Of course for public modules, there is no way around storing heaps
of information in the binary. But for modules which only contain
internal functions, all names can be vahunzed. An ad-hoc
implementation would be a modifier unvahunzable
in the
language definition to protect certain classes, methods and attributes
from vahunzation.