                          BIGNUM.PAS programmer's guide: 
                  Overview of source code Functions & Procedures 
                                        by 
                               Stephen A. Theberge.
             
                      BIGNUM.PAS is Copyright 1989,1990 by: 
            
                               Stephen A. Theberge
                                 76 Wheaton Drive
                         Attleboro, Massachusetts  02703
                                  (580-226-0447)
                                  CIS 73030,3644 
                                 GEnie S.THEBERGE
             
            All correspondence should be sent to this address.  
            
            The registered  user of BIGNUM.PAS and BIG_RPN.EXE understands 
       that  Stephen  A.  Theberge  has  a copyright on the  source  code.   
       The  user  further understands  that  use  of  this  source code is  
       restricted  by  the following: 
            
            1.  The registered user may compile the source code for his or 
       her own use on his or her own computer system. 
            
            2.   The registered user may copy the BIGNUM.PAS,   BIGNUM.TPU 
       BIG_RPN.EXE and BIG_RPN.PAS files  ONLY for system back-ups.    The 
       source code,    in  its entirety or  parts therein and no matter to 
       what  extent  it has been altered,  may not be distributed  without 
       express permission from the author. 
            
            3.     The  files on the  distribution   diskette   (ODDS.PAS)   
       is not copyrighted and may be distributed in any manner. 
            
            4.    The files BIGNUM.4,  BIGNUM.5   and/or BIGNUM.55 (on the 
       distribution   diskette)     may  only  be  distributed  by   those 
       authorized to do so by the copyright holder. 
            
            5.   The  distribution  diskette is defined  as  the  diskette 
       ordered  from  third party distributors as public  domain  or  user 
       supported software,  diskettes ordered directly from above  address 
       or any copies of the diskette obtained by any other means.  

            6. The buyer understands the software is sold as is. 

            
            The  remainder  of  this manual describes  the  functions  and 
       procedures in  the BIGNUM.PAS source code.   Refer to your diskette 
       which was supplied or the printout you may have ordered, or you may 
       print out your own copy from the diskette using your own printer.



                        CHANGES IN VERSION 2.01 of BIGNUM

            The basic principles and methods of version 2.01 of BIGNUM.PAS 
       version 2.01  are the same as those for version 1.00 except for the 
       following:
            
            1.  Variables t1, t2 and r,  outlined in "Global Data"  are no 
       longer  arrays of bytes,  but arrays of longints.   The "paper  and 
       pencil assumption"  rules are still valid.  It is more efficient to 
       process "digits" as groups rather than one at a time.   If you add, 
       subtract or multiply a 24 digit number, rather than going one digit 
       at  a  time,   one could go in groups of four.    This  saves  only 
       computer time.
            
            2.    The  speed  of the functions has  been  (on  the  whole) 
       increased due to the longint changes to 75% more efficient. 
            
            3.    Documentation  in BIGNUM.PAS reflects these  changes  as 
       well.
       
                 
             
                              THE INTERFACE SECTION 
                 
            The interface section shows you the functions  you can readily 
       access  with other Turbo Pascal Programs.   If you use Turbo Pascal 
       v.  4.0 rename BIGNUM.4 to BIGNUM.TPU on the distribution diskette.  
       If  you  use  Turbo  Pascal  v 5.0  rename *.5   to  *.tpu  on  the 
       distribution diskette.   You can, of course, compile the BIGNUM.PAS 
       unit yourself.  
            
            The *.4,  *.5  and *.55 files were compiled with the following 
       options in the interactive environment: 
            
            Range Checking ----- Off
            Stack Checking ----- On
            I/O Checking   ----- On (but it is not really needed). 
            Debug Information -- On
            Force Far Calls ---- Off
            Overlays Allowed --- On
            Align Date       --- Word
            Boolean Expressions  Short Circuit
            String Checking      Strict
            Stack Size           65520 
            Low Heap Size        0
            High Heap Size       655360 
             
            
            Functions plus,  minus and times all accept the arguments  and 
       take the form:
             
             x := function(s1,s2: string; data_ok: boolean): string; 

            Function divide accepts the arguments and takes the form:

            x  :=  divide(s1,s2:  string;  data_ok,no_decimals:  boolean): 
       string; 
            
            Functions fac, square and root take the form:
            
            x := function(s1: string): string; 
             

             We can define x in algebraic terms regarding these functions:
            
            For function plus,  x := s1 + s2.
            For function minus, x := s1 - s2.
            For function times, x := s1 x s2 or (s1 * s2). 
            For divide,         x := s1 / s2. 
            For fac,            x := s1! where s1 <146 and x > -1.
            For square,         x := s1^. 
            For root,           x := s1^. 
            
            
            The boolean  variable  data_ok is used to  tell  the functions 
       whether s1 and/or s2 are valid.  1.234 is valid, 999  is valid .123 
       is valid,  as are all integers and real numbers.   The functions do 
       not accept letters or special characters.   Scientific notation and 
       floating  point  representation  such  as 1.234E+00   are  not  yet 
       supported.  S1  and s2 contain (.01234567890-).   Of course,  minus 
       sign and decimal points can't be repeated more than once. 
            
            The  functions  do not round data but truncate.    The  number 
       9.9999999999999999999999999... would not round up to 10.   This is 
       why the squaring of a square root (in roottest.pas) does not always 
       yield 100%  returns.  For example,  the square root of 2 squared is 
       very close to, but not equal to 2.  (2^)  2. 
            
            This  should  also  be kept in mind when  using  the  variable 
       no_decimals in the function divide.  x := divide('1','2',true,true) 
       will return 0 and x := divide('1','2',true,false) will return .5.

            
                                   Global Data
             
            The variables t1,  t2  and r describe term one,  term two  and 
       result.    These  are  arrays  of  bytes.    Take  the  example   x 
       :=plus(s1,s2,true).   T1 is the array representation of s1,  t2  is 
       the  array representation of s2  and r is the  array representation 
       of a process. This process is the addition of t1 and t2.
            
            Carry  and  borrow  represent such situations as 9 +  1  where 
       carry would be set to 1 or   1  -  9 where borrow would be set to a 
       value  depending  on  the other values of t1.    Spos1   and  spos2 
       designate  the  position of a minus sign in a number.   Dpos1   and 
       dpos2 designate similar positions in a number.   Can you guess what 
       that might be?  Dpos1 and dpos2 designate the position of a decimal 
       sign in s1 and s2 respectively.
            
            
                                 FUNCTION VERIFY 

            This  function  returns TRUE if the string passed to it  is  a 
       valid number or FALSE if it is not a valid number.

            
                                 PROCEDURE STRIP 

            The  sole  purpose of this procedure is to remove the  decimal 
       point from the string passed to it.   This design decision was made 
       under the "paper and pencil assumption."   When we add,   subtract, 
       multiply, divide, etc. on paper and pencil,  we do not want to have 
       to  figure out what to do with the decimal point when we are  doing 
       math.  We make a mental note or make it obvious in other ways where 
       the decimal point is.


                               PROCEDURE DECICHECK 

            The  "paper  and  pencil  assumption"  also  applies  to  this 
       procedure.   If we add 1.5  and 222.123,  we generally would expand 
       1.5 to 1.500 to avoid confusing ourselves.  The coding of this also 
       avoided the necessity to check decimal cases when converting string 
       numbers into arrays.   It also alleviates the problem of what to do 
       when  you  add,  say a 9 to a numeric representation of  a  decimal 
       point.  

            The procedure exits early if neither string passed to it has a 
       decimal point.  Also,   since signs can only occur at the beginning 
       of numbers and the removal of them is straightforward,  a  separate 
       procedure  for  them was not written.   Such a  separate  procedure 
       would probably involve to much overhead. 


                      PROCEDURES OFFLEADZERO & OFFTRAILZERO
            
            These  procedure's names are straightforward as I believe  the 
       names  for most of the variables,  procedures and functions are  in 
       this unit.  Offleadzero takes leading zeros of a string.  Procedure 
       offtrailzero  does the reverse of this,  it takes trailing zeros of 
       of a decimal number like 1.00 or 3.0000000000000003000000000000000.  
       Since many numbers may be quite large, it is best to represent them 
       as tightly as possible.   This is why .5 is more common as a return 
       variable then 0.5.   In some cases the program  was also sped up by 
       getting rid of  this "garbage."   Since it is not very efficient to 
       ask question like, "are you adding zero?"  or the like in a program 
       and  since  such  if statements would further slow  down  execution 
       speed,   getting rid of leading and trailing zeros was  deemed  the 
       best solution. 
            
            It  is true that the procedures align strings to the same size 
       for the arrays,  it is possible that you might get fifteen zeros as 
       an answer to a problem.   If you added 35 to it, the computer would 
       have  to add leading zeros to 35  until it was  fifteen  characters 
       long.  That is why results are truncated in this fashion.


                        PROCEDURE PLACE & FUNCTION ADJUST 

            These two processes are dependent upon each other, at least in 
       terms  of  the whole unit.   Place puts a string in its  respective 
       array in numeric form and adjust retrieves the result from an array 
       into its respective string. 

            
                          RETURNING OF THE STRING: 'bad' 

            In general,  when overflow or underflow occurs, the procedures 
       will  return  the value 'bad'  which indicates the  operation  went 
       wrong.   There may be various causes for this,  but the most common 
       are:
            
            1.  Adding two 255 "digit" characters and carrying over to the 
       256 position. 
            
            2.   Subtraction  which would cause a borrow  from  the  256th 
       position of a string (if there was one). 
            
            3. Division by zero. 
            
            4. Multiplication weight error--described later.
            
            5. Square root of a negative number.
            
            6.   Factorial  greater  than 145  or less than zero  or  with 
       decimals.
            
            7. Any mathematical error. 
            
            8.  If you assume data_ok TRUE but this is not the case,   you 
       may get erroneous data.


                         IN BRIEF ON PLUS, MINUS & TIMES 

            The  basic principle of all these function is simple.   If you 
       can read Turbo Pascal code and do math on paper you should have few 
       problems  understanding  the code.   A  broad  description  of  the 
       algorithms would be:
            
            1. Make sure they are numbers.
            2. Make sure decimal points are aligned. 
            3. Pay attention to the sign of the numbers.
            4. Remove (but keep track of) signs and decimal points.
            5. From right to left- add, subtract or multiply, depending.
            6. If adding, be sure to carry.
            7. If subtracting, be sure to borrow.
            8. If multiplying, be sure to carry.
            9.  Stop if the number is ever to big,  if you borrow to  far, 
       etc.
            10. return the result, 'bad' if error occurred. 

            This algorithm is perhaps over simplified, but should give you 
       the  general  workings of these functions.   Of  greatest  interest 
       should  be  the addition and subtraction of  negative  numbers  and 
       adding  and subtracting zero to a number.   These cases are handled 
       early on in the functions to avoid complicated if statements later.  
       Let me outline the main points.
            
            The  first case in function plus takes care of s1  or s2  (but 
       not both)  being negative.  If s1 is  negative and s2  is not,  the 
       sign is taken from s1.   Plus is given the value of s2  -  s1   and 
       exits.   Try this for -5 and 3.  3 - 5 = - 2 just as -5 + 3  =  -2.  
       You  can try any example of this type.   It might be helpful to use 
       the interactive debugger of Turbo Pascal 5.0  to watch the flow  of 
       control.
            
            The  next  case  of negative numbers taken is  for  s1   being 
       positive and s2 being negative.  The sign is taken from s2 and plus 
       is given the value of s1  - s2.  Try the example 5 + (-3)  which is 
       equal  to 2.   You can also see that 5 - 3 is 2,  allowing the plus 
       function to rely on the minus function to handle signed numbers.
            
            If both s1  and s2 are negative,  plus falls through by taking 
       away  the  signs and waiting until the end of addition to  put  the 
       sign back.   Take the example (-5)  + (-3) which yields -8.   It is 
       just as easy to add 5 and 3 worrying about the sign later.
            
            
            In  function  minus  we also can assign some of  the  work  of 
       subtracting  negative  numbers to another function.   Just as  plus 
       uses the minus function,  minus similarly uses the plus function to 
       handle cases of subtracting negative numbers.
            
            In both cases (s1  being negative and s2 being positive, or s1 
       being positive and s2 being negative)  the sign is removed from the 
       negative number.  The variable temp is then given the value of s1 + 
       s2.  If the sign can be put back (when s1  <0 and s2  0), minus is 
       given the value of a minus sign concatenated to the variable  temp. 
       You can see this by trying the example (-5)  - 3 and 5 - (-3).   In 
       the  first  case 5 + 3 is equal to 8.   We then put the  sign  back 
       giving -8.   In the second case 5 + 3 is also 8,  but we DO NOT put 
       the sign back. 

            Note the variable "switched" in the function minus.   If s1<s2 
       then we switch s1 and s2.  Later we put a sign on the result of our 
       subtraction.   This implementation detail can also be understood by 
       example.    Take  9 - 99.   We know that a smaller number  minus  a 
       larger number will always be negative.   99  - 9  with a sign added 
       later will give the same result.   


            In function times a switch is made between s1  and s2, but the 
       variable  "switched"  is not used and the reason for the switch  is 
       different.  Look at the examples: 

                                        55555  x
                                 222222222222 
                                 ============ 
            
                                         and
            
                                 222222222222  x 
                                        55555
                                 ============ 
            

            Clearly they both will give the same product.  Which one would 
       you rather do on paper?   If you said the first one,  try them both 
       on  paper  and see which one lends itself to the  least  amount  of 
       errors.   In the second example carries can be placed over the next 
       '2'   to be multiplied,  but in the first you'd probably make  more 
       mistakes.    The coding for the second example takes care  of  both 
       cases.    It would be inefficient to consider both cases  when  the 
       multiplication was actually underway. 
            
            Note  that  in  multiplying there is a  slight  condition  for 
       errors different from all other functions.   If the length of  both 
       strings  (without the decimal point and sign)  are greater than 255 
       an error occurs.   Try this on paper.  In most cases the sum of the 
       number  of digits in the multiplicand and the number of  digits  in 
       the   multiplier  is   equal  to   the  number  of  digits  in  the 
       product.  
            
            There  is  also  a  condition with  multiplying  with  decimal 
       places.   With .1 times .1 you get .01.  This "totshift"  principal 
       and variable in function  times  is  used  to ensure that the total 
       shift of multiplying decimal points is not greater than 255. 
            

                                 FUNCTION DIVIDE 

                As with all of these functions and procedures,  the divide 
       function  attempts  to  simulate a paper and  pencil  model.   This 
       function was perhaps the hardest to get to a reasonable speed. 
       
            The example 12345678901234567890 / 999 should help illustrate.  
       On paper you try to see how many times 999 goes into 123.  You know 
       that  123  is less than 999  so you must bring down the four.   How 
       many times does 999  go into 1234?  You might have enough intuitive 
       skill to see that it only goes in once.   Try this example on paper 
       if you have the patience.  
            
            The grammar school way I learned was probably the best, but it 
       needed  some modification in this program.   Instead of putting 999 
       into 1234  and seeing what you have left over,  the computer does a 
       binary search like method.  First it assumes five times,  if that's 
       to  high  it  assumes  3.   If 5 is to low  it  assumes  7.    This 
       dramatically sped up the process.   It is much better than assuming 
       it goes in once,  than twice, etc.   Even though it does go in once 
       in this example (1234  /  999),  the performance is better with the 
       binary search method. 
            
            One note you should keep in mind if  you  expect  at least one 
       string  to be a negative number and no_decimals to be set to  TRUE.  
       Say you take -1 / 3 with these conditions.  You might be shocked to 
       see  that you get '-0'  as your return.   This was designed to tell 
       the  programmer  that the value is less than  zero  but  truncated.  
       Functions  root  and  fac assume this to be  erroneous.    Function 
       divide knows that you can't divide by '-0' but that '-0' divided by 
       anything is just like '0' divided by anything. 
            
            
                              FUNCTIONS FAC & SQUARE 
            
            These  functions are only  included for completeness.    Their 
       workings  should  be  clear to you at a glance given  the  previous 
       discussion. 
            
            
                                  FUNCTION ROOT 

            The  function  uses  a common method of finding  square  roots 
       found  in many algebra books.   To the left of the  decimal  point, 
       from  left to right,  group the number in pairs of two.   With  the 
       first group, find the largest square that goes into it, etc.
            
            Any  algebra book outlining this procedure should be  helpful.  
       It  is important to note that most will give you an outline  for  a 
       specific example.   I ran into problems when I started changing the 
       numbers.    With  experimentation and comparing how the method  was 
       described in various texts, I was able to derive this method.

            Also note that the square root function returns a 126  or  127 
       "digit"   number.   This is due partly to the grouping and also  to 
       allow  less  errors to occur if you want to use these  numbers  for 
       multiplications later.  Also,  square root is the slowest function, 
       and returning a 255  digit number is left up to you if you feel the 
       need for that many digits. 
            
            
                                     TIMINGS 

            The main objective of this program was to get an accuracy that 
       I  couldn't  otherwise obtain.   When I first got my  computer,   I 
       primarily used one form of BASIC.   I  wanted to verify some of the 
       odds quoted by lottery officials as to various games.
            
            One  game had 36  numbers with a pick of 6.   The  BASIC  only 
       allowed  me to calculate up to 33  factorial.   When  Turbo  Pascal 
       Version 4.0 came out I was filled with delight.  To my surprise the 
       real  type  only  was accurate up to 11  digits and  my  BASIC  was 
       accurate  for  up  to  16  digits.   As far  as  accuracy  in  some 
       statistics and large numbers, Pascal could be worse. 
            
            Determined not to give in and buy a co-processor,  I developed 
       the BIGNUM (TM) library.  I  knew string math was slow but worked hard 
       to make it as fast as possible.  This brings me to the  explanation 
       of the timings.  They are as follows: 
            
            
            PLUSTEST    .22 sec.  .33 in v. 2.01 
            MINTEST     .22 sec.  .33 in v. 2.01 
            MULTEST    3.18 sec. 1.37 in v. 2.01 
            SQUTEST    3.29 sec. 1.42 in v. 2.01 
            FACTEST   19.50 sec. 11.00 in v. 2.01 
            DIVTEST   73.44 sec. 45.00 in v. 2.01 
            ROOTTEST 100.41 sec. 75.77 in v. 2.01 
            
            These are for worst cases.  Especially note that these timings 
       were done on an 8088  at 8 MHz.   An AT or 80386  will  undoubtedly 
       yield faster execution speed. 
            
            For plustest, all 9's were used.   

            For mintest,  a 1  followed by all zeros was used,  then 1 was 
       subtracted from it.   This allows the maximum carries and the lowly 
       one which is subtracted must be padded with all zeros.
            
            For multest, all 9's were used.
            For squtest, all 9's were used. 
            For factest, a value of '145' was passed.
            
            For divtest, a one was divided by a one followed by all zeros, 
       except for the last position which was a 1.  (1  /  1.0000000...1). 
       This  ensures  almost  always  that the  binary  search  (described 
       earlier)   is at its worst case.   It also ensures that the  divide 
       function must pad with a maximum amount of zeros. 
            
            For roottest, a value of 2 was passed.  
            
            Functions  fac,  divide and root are the slowest because  they 
       rely  so heavily on the other functions.   Function times  is  slow 
       because  it  relies too much on the "paper and pencil assumptions."  
       Even in computer hardware, multiplications are slower than the adds 
       and subtracts.   The root function relies most heavily on the other 
       functions.  If a short cut could be found, I'd use it.
            
            I  don't  intend to imply that these functions can't  be  made 
       faster  or that they are the slowest around.   Indeed,  when I  was 
       developing the divide function, it used to take 400 to 500  seconds 
       to get the right answer.  
            
            I believe these routines can give you the accuracy you want in 
       a  reasonable amount of time.   I  am also convinced that  versions 
       1.00   and 2.01  of BIGNUM (TM)   are a big step in this direction.   
       My  goal is  to make these functions as fast (and accurate)   as  I 
       can.   I  welcome  any suggestions  you might have and look forward 
       for  an  even  bigger and better  new version   where there will be 
       faster adds,  minuses and some new ones thrown in. 

            
                           Thank you for your support! 
                                    Sincerely,     

                                      Steve. 

  Turbo Pascal is a registered trademark of Borland International Inc. 
  Turbo Pascal v. 4.0 and 5.0 are copyrighted by Borland International Inc. 

