ķ
 Dale W. Harrison, Copyright 1991 
 3815 Richmond Ave.  Box 111      
 Houston, Texas 77027             
 Phone: (713) 888-1479            
ͼ


                ķ
                       PARADOX QUERY PERFORMANCE       
                ͼ

 A question that is sometimes asked is whether or not Paradox is capable
 of handling very large data sets.  One concern is whether there's some
 practical upper limit to the number of records in a table, above which
 Paradox's performance degrades to point that it's no longer suitable.
 By this I mean, does there come a point where if we add more records to
 a table, Paradox will take an ever increasing amount of time to process
 each additional record during a query.  If this happens, then queries
 against tables of increasing size would soon take an impossibly long
 length of time to execute.

 To show how this would happen, let's take a hypothetical example.  Let's
 say that we have two different database products and we want to compare
 their performances when doing exactly the same query on exactly the same
 data.  Brand A's query performance degrades linearly with respect to time,
 and Brand B's performance degrades polynomially with respect to time.

 By linear performance degradation, I mean that every time that the number
 of records being queried in Brand A's database doubles, then the time it
 takes to execute the query will double.  By polynomial performance degrad-
 ation, I mean that every time the number of records being queried in Brand
 B's database doubles, then the query time will increase by more than a
 factor of two.  For example, if we double the number records in the table
 being queried by Brand B's database, we would discover that our second
 query takes not twice as long to execute as the first one did, but say
 four times as long.  Then, if we double the number of records again, our
 third query will take four times as long to complete as our second query
 did, and so forth.  There is a more precise mathematical distinction be-
 tween linear and polynomial performance, but this definition will suffice
 for the current example.

 To continue, let's assume that we have a 2000 record table set up in
 Brand A's database and an identical 2000 record table set up in Brand B's
 database, and we execute exactly that same query on each of these two
 tables and that it takes 10 seconds to execute each of these two queries.
 Now let's see what happens to the length of time it takes for the queries
 to execute as we increase the number of records in each table.

                           Ŀ
                              linear time    polynomial time 
         Ĵ
         Number of Records     BRAND A          BRAND B     
           being Queried      Query Time      Query Time    
         ͵
                2,000         10 seconds       10 seconds   
                3,000         15 seconds       23 seconds   
                                                            
                4,000         20 seconds       40 seconds   
               10,000         50 seconds        4 minutes   
                                                            
               50,000          4 minutes        2 hours     
              100,000          8 minutes        7 hours     
                                                            
              250,000         20 minutes        2 days      
              500,000         42 minutes        1 week      
                                                            
              750,000          1 hour          2 weeks     
            1,000,000         1 hours          1 month     
         

 As we can see from the table, when we're only querying a few thousand
 records, the difference in performance isn't significant.  However, when
 the table size grows large, the performance difference becomes extremely
 significant; a one and half hour wait for a million records query in
 Brand A versus a one month wait for the same query in Brand B.  Therefore,
 the question of whether a given database package exhibits linear-time per-
 formance or polynomial-time performance is a critical factor in determining
 its suitability for use with large data sets.  As will be demonstrated
 below, Paradox exhibits strict linear-time performance as illustrated in
 the following query timing test.

 The Paradox Query timing test was carefully designed to avoid any and all
 hardware dependencies or hidden advantages such as secondary indexes or
 residual RAM buffering of the Test Table.  Each timing point is the average
 of five runs rounded to the nearest second.  I exited to DOS and re-entered
 Paradox between each timing run.  The timing test was run on a 386-20MHz PC
 under IBM-DOS 3.2 with no disk caching and no TSR's loaded.  I also later
 ran the same test using a secondary index on the field being queried.
 These further results proved to be just as linear as the original set of
 timing tests with the difference being that each query was considerably
 faster (as one would expect).  This following script was used to measure
 the query times.

           ķ
            01   Query                                           
            02                                                   
            03     Test | Field_1 |   Field_2   | Field_3 |      
            04          | Check   | Check aaaaa | Check   |      
            05          |         |             |         |      
            06          |         |             |         |      
            07                                                   
            08   EndQuery                                        
            09                                                   
            10   t0 = Time()                                     
            11   Do_It!                                          
            12   t1 = Time()                                     
            13                                                   
            14   Message t0+"  "+t1                              
            15   retval = GetChar()                              
           ͼ


 I designed the test so that each Query would extract exactly 2% of records
 in the Test Table.  I did this by loading the Test Table with a generated
 data set up front.  I created a starting table containing 50 records like
 the following (note that each Field_2 /Field_3 combination is unique):

              TestField_1Field_2Field_3ͻ
                    1              aaaaa     AAAAA  
                    2              bbbbb     BBBBB  
                    3              ccccc     CCCCC  
                    4              ddddd     DDDDD  
                    .                .         .    
                    .                .         .    
                   50              ,,,,,     <<<<<  
                                                    


 I then concatenated 2000 copies of this Table to generate a final Test Table
 with 100,000 records total containing 2000 records with [Field_2] = "aaaaa".
 I then ran the following script to set the value of [Field_1]:

           ķ
            01   Edit "Test"                                     
            02   Scan                                            
            03     [Field_1] = Format("w5,ar",StrVal(RecNo()))   
            04   EndScan                                         
            05   Do_It!                                          
           ͼ


 After which I did a {Restructure} to make [Field_1] a Primary Key. I then ran
 the 100k-Record timing tests. Then the following script was run to remove the
 last 5000 records in the Test Table:

           ķ
            01   Edit "Test"                                     
            02   END                                             
            03   For i From 1 to 5000                            
            04     DEL                                           
            05     Message StrVal(i)                             
            06   EndFor                                          
            07   Do_It!                                          
           ͼ


 This left a Table with 95,000 records with 1900 records in which [Field_2] =
 "aaaaa", again 2% of the total records. This process was then repeated for
 the next test point and so forth.  The point of the 2% Answer Table was to
 factor out the speed of the hardrive by making the time spent scanning the
 Test Table proportional to the time spent writing the Answer Table on each
 test run.

 The exact hardware configuration isn't important to the outcome of the test
 because what I was trying to demonstrate was the fact that Paradox exhibits
 linear behavior out to at least 100,000 records and does not hit any perfor-
 mance wall.  Faster hardware would have demonstrated the same linear be-
 havior, the slope of the timing line would have just been steeper.  However,
 the exact hardware configuration is as follows:

           ķ
                             386-20MHz                           
                             4 Meg total RAM                     
                             110 Meg Hardrive                    
                             9 Meg free on Drive C:              
                             IBM DOS 3.2                         
                             No Disk Cache                       
                             No TSR's loaded                     
           ͼ



 The test results are as follow:
                                                ķ
 ͸  Ĵ Test Conditions ķ
  Number of Recs   Query Process            ͼ        
     to Query          Time                                            
 ͵   Structure of Test                 
    100,000           57 secs        Table:                            
     95,000           55 secs                 Field_1  A5*             
     90,000           52 secs                 Field_2  A5              
     85,000           49 secs                 Field_3  A5              
     80,000           46 secs       ĺ
     75,000           43 secs                                          
     70,000           40 secs        Where: -Field_1 was a unique      
     65,000           38 secs                Key set to the RecNo()    
     60,000           35 secs                                          
     55,000           32 secs               -Field_2 and Field_3       
     50,000           29 secs                were filled with 5-char   
     45,000           26 secs                alpha strings             
     40,000           23 secs                                          
     35,000           20 secs               -No secondary indexes      
     30,000           17 secs                were used                 
     25,000           14 secs                                          
     20,000           12 secs               -The Query was structured  
     15,000            9 secs                so that the Answer Table  
     10,000            6 secs                generated each time was   
      5,000            3 secs                exactly 2% of the records 
      3,000           ~1 secs                in the Test Table.        
 ;  ͼ

                           Ŀ
         Ĵ Paradox Query Timing Test Ŀ
                                            
   100k                                                    
                                                                        
    90k                                                    
                                                                        
    80k                                                    
                                                                        
 R  70k                                                     
 e                                                                      
 c  60k                                                     
 o                                                                      
 r  50k                                                    
 d                                                                      
 s  40k                                                    
                                                                        
    30k                                                    
                                                                        
    20k                                                    
                                                                        
    10k                                                    
                                                                        
        ++++++
         0         10        20        30        40        50        60
                          Query Process Time (seconds)


                              ͵ The End 
