
This is an ADDENDUM to the BYTEmark documentation. Specifically, this addendum addresses the changes made to the BYTEmark benchmarks in order to port those tests to Java.

General

* The MINSECONDS parameter was dropped. This parameter allows you to select the minimum number of seconds a test would run. We deemed this parameter redundant with the confidence-interval mechanism (because the latter verifies that 5 test scores are received that are sufficiently "clustered".)

* Most of the default array sizes are changed with the Java version. We did this mainly to keep the runtime of the entire benchmark down to something reasonable (Java runs slower than "equivalent" compiled C).
  The defaults are:

Size of array in numeric sort: 8111
Size of array in string sort: 4111
Size of bitfield array: 32768
Number of bitfield operations per iteration: 30
Size of arrays in emulated floating point: 250
Size of matrix in assignment algorithm: 51 x 51
Size of text block in IDEA test: 4000
Size of text block (uncompressed) in Huffman test: 5000
Number of patterns in Neural Net test: 5
Size of matrix in LU Decomposition: 101 x 101

* Note that in the original test, it was not possible to use parameters in the command file to alter the size of 2D matrices (such as used in the Assignment algorithm test and LU Decomposition test). This is possible with Java, and parameters are in place that allow you to change the size of the arrays.
  However, if you do this, your results will not be comparable with BYTE default values. (The system will produce indexes for you, but they will be meaningless.)

* The system now selects what it considers to be an adequate value for the GLOBALMINTICKS value. This is done automatically (unless you overrride that value).
  The default is calculated by running an empty loop several times, and finding the minimum resolution of the clock. This resolution is multiplied by 100, which becomes the minimum ticks value. (The reason for this was this it was discovered that, on many NT systems -- and we suspect on others we haven't tried -- the clock "ticked" in 55-unit steps. That is, though the value returned by the "Java" clock was in milliseconds, its tick value was limited by the resolution of the 55- millisecond clock on the PC. Each tick therefore advanced the Java clock by a value of 55.)

=====

Parameters available with the application version of jBYTEmark:

GLOBALMINTICKS=<value>
  Sets the minimum ticks allowable per test. ALLSTATS=T/F
  Displays statistics. (NOT IMPLEMENTED IN 0.9 VERSION)

OUTFILE=<path>
  Records output in file <path>. (NOT IMPLEMENTED IN 0.9 VERSION)

CUSTOMRUN=T/F
  Set to T indicates custom run.

DONUMSORT=T/F
  Enable/disable execution of numeric sort test.

NUMNUMARRAYS=<value>
  Number of arrays used in numeric sort test.

NUMARRAYSIZE=<value>
  Size of array used in numeric sort test.

DOSTRINGSORT=T/F
  Enable/disable execution of string sort test.

STRARRAYSIZE=<value>
  Set size of string array.

NUMSTRARRAYS=<value>
  Set number of string arrays sorted per iteration of test.

DOBITFIELD=T/F
  Enable/disable bitfield test.

NUMBITOPS=<value>
  Number of bitfield operations per iteration of test.

BITFIELDSIZE=<valUe>
  Size of bitfield in bitfield test.

DOEMF=T/F
  Enables/disables the emulated floating-point test.

EMFARRAYSIZE=<value>
  Sets the size of the arrays used in the emulated floating-point test.

EMFLOOPS=<value>
  Sets the number of loops per iteration in the emulated floating- point test.

DOFOUR=T/F
  Enables/disables the fourier test.

FOURASIZE=<value>
  Sets the size (# of coefficients generated) for the fourier test.

DOASSIGN=T/F
  Enables/disables the assignment algorithm test.

ASSIGNARRAYS=<value>
  Sets the size of the arrays used in the assignment algorithm. (The algorithm uses square matrices. This value is used for both the # of rows and # of columns of the generated matrix.)

DOIDEA=T/F
  Enables/disables the IDEA test.

IDEAARRAYSIZE=<value>
  Sets the size of the plaintext & ciphertext arrays of the IDEA test.

IDEALOOPS=<value>
  Sets the number of loops per iteration of the IDEA test.

DOHUFF=T/F
  Enables/disables the Huffman algorithm test.

HUFFARRAYSIZE=<value>
  Sets the size of the uncompressed plaintext array.

HUFFLOOPS=<value>
  Sets the number of iterations per loop of the Huffman test.

DONNET=T/F
  Enables/disables the neural-net test.

NNETLOOPS=<value>
  Sets the number of loops per iteration in the neural net test.

DOLU=T/F
  Enables/disables the LU Decomposition test.

LUNUMARRAYS=<value>
  Sets the number of arrays per iteration processed in the LU Decomposition test.

LUARRAYSIZE=<value>
  Sets the array size of the LU decomposition test.

=====

Alterations to specific tests:

Numeric Sort

  Little change was made to this test from the C version. It continues to use a heap sort as the driving algorithm. Note that the original test sorted 32-bit longs. The Java version also sorts 32-bit values; only with Java an int is 32-bits.
  In the original C version, the multiple arrays were actually allocated as a single large block of memory, and the test ran by moving a pointer variable from the start of one array to the next. Java does not permit pointers, so using a pointer to manage multiple arrays was definitely out. Fortunately, Java allows arrays to be created and destroyed at runtime, so the situation was handled by creating a 2-dimensional array. (An array of arrays in Java-speak; which is actually what we wanted in the first place.)
  So, the first dimension selects the array, and the second dimension selects the element within the array.
  The size of the array sorted remains at 8111 elements, as in the original version.

String Sort

  This test would be more properly named "Byte Array Sort" in its Java incarnation. What C understands as a string is different from what Java understands as a string.
  The C version of the benchmarks built a long one-dimensional array, and filled that up with random strings that were each preceded by a length byte. An auxiliary array -- the "offset pointer" array -- was an array of offsets into the string array such that the start of string zero was stored in the 0th element of the offset pointer array, the start of string 1 was stored in the 1st element of the offset pointer array, etc. This allows the algorithm to rapidly find the nth string (rather than having to locate a string within the array by starting at the beginning and using length bytes to "hop" into the array to the proper location).
  Java defines a String class, complete with its own set of methods. With this version of the benchmarks, we chose to keep the tests closer to the C originals, and therefore a string in this test is more like a C string and not at all a Java string. In particular, a string is an array of bytes. Most of the other parts of the benchmark were translated fairly directly from the C original:
 * The first byte of the string is a length count.
 * The offset pointer array holds offsets into the byte array that point to the start of a corresponding string (see above).

  The array of bytes is a 2D array, as in the numeric sort. The first dimension selects the array, the second dimension selects the byte within the array.

  This test makes heavy use of the System.arraycopy() method for moving strings around within the array.

  There is not equivalent to the strcmp() library routine in Java, so we coded our own version. Admittedly, we had to do this because we chose not to use the String class. (BYTE hopes to release other Java- specific benchmarks in the near future. Those future benchmarks will likely contain a String test.)

The size of the string array was reduced to 4111 from 8111 elements in the original test to account for the reduced performance of Java as compared to compiled C.

Bitfield

  The original bitfield tests used an overly-naive means of setting and clearing the bits in the bitfield. Specifically, the test accessed one bit at a time, which required a fetch - alter - store cycle for each bit that had to be set or cleared. (Bits are always set or cleared in a continguous series; such a series is referred to as a "run".)
  The Java version's performance has been considerably improved. We did this by setting, clearing, or toggling an enter 32-bit int worth of bits when possible. (The routines that set, cleared, and toggled bits were modified from the original.)
  We kept Java versions of the original test in the source code for reference purposes. The names of the new version routines all have the form "<name>Fast".
  Outwardly, the test has not changed. The size of the bitfield is the same, and the number of operations (an operation being setting, clearing, or toggling a run of bits) per iteration remains the same.
  The size of the bitfield array remains at 32768 bytes.

Emulated Floating Point

* The InternalFPF structure was recast as a class. Consequently, when the test begins building arrays of floating-point numbers, it's actually building arrays of InternalFPF objects. This, plus the fact that the test uses some of the emulated floating-point routines to construct the test numbers, means that this is one of the longer- running tests; mainly because of setup time.

* Handling the bit-twiddling in this test was especially tricky. Note that we defined a number of "masks" in the static portion of the EMFloatPnt class. These are fequently-used bit patterns. Java does not let you use 64-bit literals in source code, so if -- for example -- you want to have a long 64-bit number with its high bit set, you have to shift a 1 into place with a bit of code.

* The original test treated the mantissa as an array of 4 16-bit shorts. We deemed it easier to manipulate the mantissa as a pair of 32- bit integers (since int seems to be the data type Java is most comfortable in coping with). This made shifting and mathematics easier. The 32-bit ints could be moved into 64-bit longs for shifting and mathematic operations that involved manipulating the carry (or bits shifted out). Of course, we had to be careful about moving an int into a long; if the high bit of the int is set, it gets treated as a sign and is replicated in the upper 32 bits.

* Another problem was dealing with the fact that Java passes primitive types by value. This caused troubles when we had functions that, say, added two numbers and a carry, and returned the result plus the new carry. It would have been nice to define that function as:
    add(int carry, int numbera, int numberb, int result)
But that won't work, because carry and result are both passed by value (i.e., you can pass them IN to add(), but they won't come out).
  We opted for the following scheme:
    add(int[] carry, int[] numbers)
where carry is defined as int[1] (a one-element array) and numbers is defined as int[3]. Since arrays are passed by reference, that takes care of carry[] (if you don't mind the back-flip we just had to make). The numbers[] array is defined in terms of the first definition of add() such that:
    numbers[0] ==> numbera
    numbers[1] ==> numberb
    numbers[2] ==> result

* For speed reasons, the size of the arrays in the emulated floating- point test were reduced from 5000 to ???.

* As mentioned above, the setup time for the emulated floating point test is longer than we'd like. We are investigating ways to improve that.

Fourier

* There's not much to say here. The Fourier test was as easy a port as you could hope. The only thing we really had to watch out for was verifying that the calls to the math routines were properly preceded with "Math.".

Assignment

* As with the LU Decomposition test (see below), the original Assignment test manipulated the arrays via pointers. This involved some less-than-trivial casts. (The program allocated a large memory block, then maneuvered a pointer through the block as the test moved from one array to the next.)
  With Java, multidimensional dynamic arrays are easily managed. Therefore, we simply created a 3-dimensional array. Because the assignment algorithm works with 2-dimensional matrices, the Java version simply uses the first dimension to select the current matrix.

* Note that this test actually involves dynamic memory creation/destruction during its timed portion. (Most of the other tests allocate all the memory they need BEFORE execution. Memory is allocated in the scope of the test object itself, and is therefore not local to a particular function.)
  The second_assignments() routine creates two temporary arrays, linesrow[] and linescol[], which are used only in the scope of that function. This is not likely to have a huge impact on the performance of the test, because a) the test is not multithreaded and b) the algorithm is not reentrant. Consequently, at a given time there is likely to be only one copy each of linesrow[] and linescol[] allocated.

Huffman

* The routine for creating a "document" to be compressed in slightly different from the original version, largely owing to how Java handles Strings and byte arrays (and conversions betweeen the two).

IDEA

* Here's a weirdness of Java that we had to deal with here as well as in some other tests (Floating Point Emulation in particular): Any binary operation gives an int result even if the operands are byte or short. Consequently, two add two byte operands, you need a cast as with:
    byte_a = (byte)(byte_b + byte_c);
This automatic conversion to int also happens with unary negation, but not with <operator>= constructions. (So, byte_a+=byte_b is fine with no cast needed.

* Casting an integer expression to a different size can have different effects depending on whether you enclose the expression in parentheses or not. For example, (long) (int_a * int _b) doesn't give the same result at (long) int_a * int_b when the outcome of the operation goes to 32 bits and thus produces a "false negative" for int. In the first case, you get the inaccurate negative number evaluated inside the parentheses and the case to long extends the sign bit all the way up through the upper 32 bits. In the second case (without parentheses) the two numbers are multiplied as if longs and you get the appropriate positive number.
  A similar situation occurs if you're trying to chop a larger integer down to a smaller size when trying to work as if unsigned. For example, (int)(long_a>>>16) will give a different result from (int) long_a >>> 16 if there are bits set in the 32nd to 63rd bit range of long_a.

* The original test used multiplication modulus 0x10001. The code is provided in the source, but not used (Java's native % function is faster).

* The encryption and decryption subkeys in the original version were stored in arrays of 16-bit shorts. In the Java implementation, those subkeys are stored in integer arrays so they can be handled as though unsigned.

Neural Net

* The original version "learned" 26 input and output patterns. All the patterns are included in the Java port, but only the first five are actually used. This was done to keep the runtime down to something reasonable.

* In the original version, the input and output patterns were all read in from a file named NNET.DAT. These have been moved into arrays within the NeuralNetTest class. (The logistics of trying to get a file loaded over the net and into a Java applet were...well...forget it.)

LU Decomposition

* As with the Assignment algorithm, we use a 3D array in place of an allocated memory block toured by a pointer. LU Decomposition is built around 2D matrices. We therefore build a 3D array viewed as an array of matrices (the first dimension of the array selects the matrix we're working with).

* The LU Decomposition algorithm uncovered the nefarious "alignment bug". Specifically, under operating systems such as Windows NT, memory was allocated so as to be aligned on a 32-bit boundary; however, the LU Decomposition algorithm used floating-point doubles -- which are 64-bits wide -- so that if the array happened to be allocated starting at a non-64-bit address, all accesses were (on Intel machines) slower. The result was a noticeable degradation in performance.
  We patched the benchmarks to force alignment to 64-bit boundaries.
  Of course, in the Java code, none of this is even relevant...there is, therefore, no patch.

--Rick Grehan
--Dave Rowell
BYTE Magazine
