Subject: [l/m 11/2/92] good conceptual benchmarking.(2/28) c.be FAQ
Date: 2 Mar 1996 13:25:06 GMT

2.Benchmarking concepts.....<this panel>
3.PERFECT Club/Suite
4
5.Performance Metrics
6.Temporary scaffold of New FAQ material
7.Music to benchmark by
8.Benchmark types
9.Linpack
10.Network Performance
11.NIST source and .orgs
12.Benchmark Environments
13.SLALOM
14
15.12 Ways to Fool the Masses with Benchmarks
16.SPEC
17.Benchmark invalidation methods
18
19.WPI Benchmark
20.Equivalence
21.TPC
22
23.RFC 1242 terminology (network benchmarking)
24
25.Ridiculously short benchmarks
26.Other miscellaneous benchmarks
27
28.References
1.Introduction to FAQ chain and netiquette

Benchmarking
is difficult black art which combines several technical and social
problems.  It is a juggling act,
as such, the solutions must attempt to combine several components
to the solutions: technical and social.

In particular the social problems require some degree of consensus
very much like the problems Internatonal measurement: ala the
Metric system.

Benchmarking is usually seen as a linear process:
...    -----------
...    | test    |
."optional input" -->| program |---> "output [time]"
...    |.      |
...    -----------
Sort of like a ruler or scale.
It really is a more detailed process.  This is probably too simplistic.

A more useful figure:
  -----------  -----------  -----------  -----------
  |pre      |  |pre      |  |         |  |post     |
->|compiled |->|test     |->|test     |->|test     |->
  |condition|  |execution|| |         | ||execution|
  -----------  -----------| ----------- |-----------
...  | ..|
...  | ----------- |
...  | |control  | |
...  |-|condition|-|
...    |         |
...    -----------
From this figure one can see some of the more detailed elements and
issues of the basic measurement problem: equivalence, concurrency,
control, intrusive (invasive) measurement, overheads, preparation, etc.

Before you ever say: "That's trying to measure apples and ornanges"
you had best realize that the biologists and biochemists did just that
several decades ago.  They did.  They discovered that apples and oranges
have a very common base, it's called DNA and the gene maps between the
two differ very little.

Let's make some clear distinctions:
Performance Evaluation
.The over all process.  (Analysis and masurement)
Performance Analysis
.Like mathematical analysis.
.The implication should be mathematical or simulation.
.Susceptible to illusion and deception.  Never the last word.
.Ideally: deterministic.
Performance Measurement
.The emphasis should be empirical.  Benchmarks run on simulations
.are "Analysis."  Measurement is a verification of real hardware
.performance.  It's bound by the laws of physics.  It can be spoofed.
.It appear as "the last word."  This is where benchmarking lies.
.Ideally: demonstrable, repeatable, and reproducible.

The history of area is such that many architectures are claimed for
one performance and in the reality under-performing (usually).

[Wulf81]:
  We want to learn about the consequences of different designs on the
  useability and performance of multiprocessors.
  Unfortunately, each decision we make precludes us from exploring its
  alternatives.  This is unfortunate, but probably inevitable for hardware.
  Perhaps, however, it is not inevitable for the software....
  and especially for the facilities provided by the operating system.

%A William A. Wulf
%A Roy Levin
%A Samuel P. Harbison
%T HYDRA/C.mmp: An Experimental Computer System
%I McGraw-Hill
%D 1981
%K grecommended91, CMU, C.mmp, HYDRA OS,
multiprocessor architecture and operating systems
%K bsatya, book, text,
%K enm, ag,
%X A detailed description of the philosophy, design, and implementation of
Hydra, similar in a sense to Organick's monograph on Multics. Highly


Quoting Georg von\ Bekesy
 . . . AS I see it the difference between successful and unsuccessful
research is basically a problem of asking the right question.  I can
distinguish the following types of questions:
.1. The unimportant question
.2. The premature question
.3. The strategic question
.4. The stimulating question
.5. The embarrassing question (the kind asked at meetings)
.6. The pseudo-question (often a consequence of a different
.definition or a different approach)
As a beginner I wanted to find a strategic question, but was unable to
do so.
Pierce (and Bekesy) likes stimulating questions:
they motivate you to do something.

%A Willem A van\ Bergeijk
%A John R. Pierce
%A Edward E. David, Jr.
%T Waves and the Ear
%I Double Day
%C Garden City, New York
%D 1960

  Every science begins with the observation of striking events like
thunderstorms or fevers, and soon establishes rough connections between
them and other events, such as hot weather or infection.
The next stage is a stage of exact observation and measurement, and it is
often very difficult to know what we should measure in order to best
explain the events we are investigating.
In the case of both thunderstorms and fevers the clue came from measuring
the lengths of mercury columns in glass tubes, but what prophet could
have predicted this?
Then comes a stage of innumerable graphs and tables of figures, the dispair
of the student, the laughing-stock of the man in the street.
And out of this intellectual mess there sudden crystallizes a new and easily
grasped idea, the idea of a cyclone of an electron, a bacillus or an
antitoxin, and everybody wonders why it had not been thought of before.

%A J.B.S. Haldane
%T The Future of Biology
%B oN BEinG THE rIGht SiZe and other Essays
%O Oxford Univ. Press
%C Oxford, England
%D 1985
%X Also good for "What 'Hot' means" (terminology) and pseudo science essays.

"Program measurement tools make a good case in point.  For years
programmers have been unaware of how the real costs of computing are
distributed in their programs.  Experience indicates that nearly everybody
has the wrong idea about the real bottlenecks in their programs; it is no
wonder that attempts at efficiency go awry so often, when programmers are
never given a breakdown of costs according to the lines of code they have
written.  Their job is something like that of a newly married couple who
try to plan a balanced budget without knowing how much the individual items
like food, shelter, and clothing cost.  All we have been giving programmers
is an optimizing compiler, which mysteriously does something to programs
it translates but never explains what it does.  Fortunately we are now seeing,
at last, the appearance of systems that give the user credit for some
intelligence; they automatically provide instrumentation of programs and
appropriate feedback about the real costs."

%A Donald E. Knuth
%T Computer Programming as Art
%J Proceedings ACM Annual Conference
%D November 1974
%K Turing award lecture,

                   ^ A  
                s / \ r                
               m /   \ c              
              h /     \ h            
             t /       \ i          
            i /         \ t        
           r /           \ e      
          o /             \ c    
         g /               \ t  
        l /                 \ u
       A /                   \ r
        <_____________________> e   
                Language
