IBM

                               PENTIUM STUDY



Summary:

IBM Research focused on the likelihood of error on the Pentium  chip  in
everyday floating point division activities.  Intel  has  analyzed  the
probability of  making  an  error  based  on  the  assumption  that  any
possible 64  bit  pattern  is  equally  likely  to  occur  in  both  the
numerator and the denominator.  If that were the case, then  the  chances
of the error would be 1  in  9  billion.  They  also  estimate  that  an
average spreadsheet user will  perform  1,000  floating  point  division
per day.  Based on these  assumptions,  Intel  estimates  that  an  error
will be encountered only once in  9  million  days  (or  once  in  about
27,000 years).

Our  analysis  shows  that  the  chances  of  an  error  occurring   are
significantly greater when  we  perform  simulations  of  the  types  of
calculations performed  by  financial  spreadsheet  users,  because,  in
this case, all bit patterns are not equally probable.

Probability Tests:

We have analyzed a Pentium chip in order to understand  the  sources  of
errors and have found that in order for an error to  occur,  both  the
numerator and denominator must  have  certain  "bad"  bit  patterns,  as
described below.

First, for the denominator  to  be  "at  risk",  that  is,  capable  of
producing an error with certain  numerators,  it  must  contain  a  long
string of consecutive 1's in its binary representation.  Although  such
numbers represent only a very small fraction of  all  possible  numbers,
they do  occur much more frequently when denominators  are  created  by
adding,  subtracting,  multiplying  or  dividing  simple  numbers.   For
example,  the  number  3.0  represented  exactly,  does  not  have  that
pattern, but the result of the computation 4.1-1.1  does  have  that
pattern.

How many denominators produced in this fashion can  be  "at  risk"  that
is, capable of producing  an  error  for  certain  numerators?  When  we
randomly added or subtracted ten random numbers having  a  single  digit
dollar amount and two digit in cents, for example, $4.57, then  one  out
of every 300  of  the  results  was  "at  risk"  and  hence  capable  of
producing an error.  If we repeated the  test  with  numbers  having  two
digit dollar amounts and two digits in cents,  then  one  out  of  every
2,000 could cause  an  error.  If  the  denominator  was  calculated  by
dividing two numbers having one digit to the left and one to  the  right
of the decimal point, then approximately one in every  200  could  cause
an error.

For simplicity, suppose that one of  every  1000  denominators  produced
by some calculations was "at risk."

Now, suppose we have created a bad denominator.  What is  the  chance  of
now encountering a bad  numerator,  which  will  produce  an  error?  It
depends on the actual value of the "at risk" denominator, but  based  on
our tests, a conservative estimate would be that only one out  of  every
100,000 numerators causes a problem.

Finally, when we  combine  the  chances  of  a  bad  numerator  and  the
chances of a bad denominator, the result is that one out  of  every  100
million divisions will give a  bad  result.  Our  conclusion  is  vastly
different from Intel's.

Frequency Tests:

We also questioned Intel's  analysis  and  assumption  that  spreadsheet
users  will  only  perform  1,000  divides  in  a     day.   Tests   run
independently suggest that a spreadsheet user (Lotus  1-2-3) does  about
5,000 divides every second  when  he  is  calculating  his  spreadsheet.
Even if he does this for only 15 minutes a day, he    will  perform  4.2
million divides in a day, and according  to  our  probability  findings,
on  average,,  a  computer  could  make  a  mistake   every   24   days.
Hypothetically, if 100,000 Pentium customers were doing  15  minutes  of
calculations every day, we could expect 4,000  mistakes  to  occur  each
day.

Conclusion:

The Pentium processor does  make  errors  on  floating  point  divisions
when both the numerator and denominator of  the  division  have  certain
characteristics.  Our study is an analysis  based  on  probabilities  and
chances.  In  reality,  a  user  could  face  either  significantly  more
errors or no errors at all.  If an error occurs, it will  first  appear
in the fifth or higher significant digit and it may have  no  effect  or
it may have catastrophic effects.

Additional Technical Detail:

Some Experiments on Pentium Using Decimal Numbers

According to an Intel white paper,  if  you  were  to  choose  a  random
binary bit pattern for numerator and the  denominator,  the  probability
of error in divide is about 1 in 9 billion.

The error occurs when certain divisors (termed "at  risk"  or  bad)  are
divided into certain numerators.  In order for the error  to  occur,  our
belief is that divisors must lie in  a  certain  range.  For  each  such
denominator, there is a range  of  numerator  values  which  produce  an
incorrect result.

An example of affected numbers is the decimal constants we  hardwire  in
our programs.  For example, if converting from months to  years  and  we
are interested in 7-8 decimal digits  of  accuracy,  then  we  can  hard
wire a constant to convert from months to years.

     alpha = 1/12 = .083333333

  Let us construct a hypothetical  example.  We  have  contracted  a  job
  which is expected to last 22.5 months.  The total value of the  contract
  is $96000.  From this, tax at the rate of 14 and 2/3  percent  rate  has
  to be deducted.  The taxing authority has defined 14 and 2/3 percent  to
  be 14.66667. We want to calculate the net take at a  per  annum  basis.
  We do the following calculations.

     Tax = 96000*.1466667 =  14080.0032

Net take home money = 96000 - 14080.0032 = 81919.9968

The number of years in 22.5 months = 22.5*.083333333
                                   = 1.8749999925 years

Net take home money per annum =  81919.9968/1.874999925

                              =  $43690.6667

Most machines give the above answer which  satisfies  the  desired  7-8
digit accuracy criterion.  On Pentium, the answer  is  $43690.53,  which
has only 5 correct digits.

In this example, both numerator and denominator are bad  numbers.  They
are  both  near  some  simple  integer   boundary   in   their   binary
presentation and as you rightly observed, these numbers occur  in  real
world at a much higher frequency compared to  the  totally  random  bit
pattern hypothesis.

Probabilistic Analysis

We are addressing the question of how  likely  it  is  to  have  a  bad
divisor.  On Pentium, a bad divisor belongs  to  one  of  the  five  bad
table entries characterized by  1.0001,  1.0100,  1.0111,  1.1010,  and
1.1101, followed by a string of l's in the mantissa.

We have found that if the string of 1's is of length 20 or so, then  it
is a bad divisor.  Given a bad divisor, the  probability  of  making  an
error in the division increases dramatically, compared to the  1  in  9
billion figure quoted by Intel.

We did some simple experiments using decimal numbers and  the  findings
are reported below.  We counted only those bad divisors which belong  to
one of the above five table values, followed by a  string  of  32  l's.
Intel people argue that all binary  patterns  are  equally  likely.  If
that was really the case, the probability of finding a bad divisor,  as
defined above, will be 5/(2**36) or about  one  in  13  billion  random
divisors.  However, we are finding the probabilities to be much  higher.

Addition/Subtraction of Decimal Numbers

In this experiment, we  randomly  added  or  subtracted,  10  uniformly
distributed random numbers having one or  two  decimal  digits  (as  in
dollars and cents) and then  we  examined  the  result  for  the  above
binary patterns.  Here are the results  for  two  cases.  In  the  first
case, we chose only one digit to the left of decimal (as in $3.47)  and
in the second case, we chose two digits to the left of the decimal  (as
in  $29.56).  All  the  digits  were  chosen  randomly   with   uniform
probability.  In the third case, we chose one digit to the right of  the
decimal point  and  two  digits  to  the  left.  The  results  below  give  
the number  of  times  the  result  of  this  experiment  has   the   bit   
pattern corresponding to a bad divisor.

   Case 1 (one digit to the left, two to the right)          ---   188  out  of
                                                                        100,000
   Case 2 (two digits to the left, two to the right)         ---    45  out  of
                                                                        100,000
   Case 3 (two digits to the left, one to the right)         ---   356  out  of
                                                                        100,000

Clearly,  these  probabilities  are  much  higher  than  those  obtained   with
the random bit pattern hypothesis.

Division Of Two Decimal Numbers:

These  experiments   were   conducted   through   exhaustive   tests   on   all
possible  digits  patterns.  Here  (a.b)/(c.d)   represents   division   of   a
two digit (one to  the  left  of  the  decimal  point  and  one  to  the  right
of  the  decimal  point)  number  by  another  two  digit  number.

  (a.b)/(c.d)         -      44 out of 10,000
  (O.ab)/(O-cd)       -      27 out of 10,000
  (a.bc)/(d.ef)       -     344 out of 1,000,000
  (ab.c)/(de.f)       -     422 out of 1,000,000

Multiplication   of  Two  Numbers

Here  we  are  multiplying  a  decimal  number  by  another  number  which  was
computed as a  reciprocal  of  another  decimal  number  as  in  scaling  by  a
constant.

  (a.b)  * (1/(c.d))          -     37 out of 10,000
  (a.bc) * (1/(d.e))          -    139 out of 100,000
  (a.bc) * (1/(d.ef))         -    434 out of 1,000,000

To  summarize,  for  the  decimal  calculations  of  the  type   given   above,
the  probability  of  having  a  result  which  falls  into  the  category   of
being a bad divisor  is  rather  high.  It  appears  to  be  somewhere  between
1 in 3000 to 1  in  250.  Let  us  say  that  it  is  of  the  order  of  1  in
1000.

Furthermore,   if   the   rounding   mode   corresponds   to   truncate,    the
probability of arriving at bad divisors increases significantly.

The  Dependency  on   Numerator

Given a bad  divisor,  the  divide  error  occurs  for  some  range  of  values
of  the  denominator.  If  we  were  to  take  a  totally  random  bit  pattern
for  the  denominator,  the  probability  of  error  appears  to  be   of   the
order one in  100,000.  This  is  a  first  cut  rough  estimate  and  probably
could  be  improved.  It  appears  that   probabilities   are   different   for
different  table  values.  The  table  corresponding  to  '1.0001'   seems   to
have  the  most  error.  For  numerator  also,  there  are  bands   of   values
where  the  error  is  much  more  likely.   Again   these   bands   are   more
prominent  near  whole  numbers.  For  example.  if  we  were  using  (19.4   -
10.4) = '9' as a  divisor  (a  bad  one)  ,  and  you  picked  a  random  value
between  6  and  .6.01,  as  the   numerator   then   the   chance   of   error
increases           to          about          one           in           1000.

For the purpose of our simplistic analysis, we  will  use  the  figure  of
1 in 100,000 for a bad numerator.  This assumes  that  we  are  picking  up
a random numerator.  Using the value  of  1  in  1000  as  the  probability
for a  bad  divisor,  the  overall  probability  for  a  'typical'  divide
being incorrect seems to be of the order of 1 in  l00  millions.  This  is
about two orders of magnitude higher compared to  the  Intel  estimate  of
1 in 9 billion.




Probability of a Divide Instruction

Let us assume that a Pentium operating at 90  MHz  does  an  op  in  1.  2
cycles on the average.  That will give about  75  Million  ops  per  second
of actual compute time.  We will use a figure of  1  divide  per  16,  000
instructions,  even  though  many  estimates  suggest  a  much      higher
frequency of divide.

Thus  using  this  conservative  estimate  of  one  divide  per     16,000
instructions, we come up at about 4687 divides per second.         Let   us
further assume that  a  typical  spread-sheet  user  does  only  about  15
minutes of actual intensive computing per  day.  Then,  he  is  likely  to
do 4687*900 = 4.2 million divides per day.  Assuming an  error  rate  of  I
in l00 million, it will take about 24 days  for  an  error  to  occur  for
an individual user.

Combine this with the fact  that  there  are  millions  of  PENTIUM  users
worldwide, we quickly come to the conclusion  that  on  a  typical  day  a
large  number  of  people  are  making  mistakes  in   their   computation
without realizing it.

IBM                                                            Corporation
ibmstudy@watson.ibm.com