TUTORIAL.TXT
---------------------

Until I get around to constructing some more decent documentation (probably in 
the form of on-line help), this tutorial file will have to do.  I'm basically going to 
just write down whatever I think is useful to know using actual examples.  I 
have include two data files for example: 1994.xls is an Excel 4.0 worksheet and 
1994.txt is a tab-delimited text file.  These are the only files types TimeStat can 
handle for now.  The Formula One VBX can handle .VTS files, a proprietary 
format supplied by Visual Tools, Inc.  I do not consider it useful.  The two 
example files contain identical numerical data: 1994 daily closing prices of 
DJIA, DJTA, DJUA, DJBA, S&P and some other information.  This data is 
supplied by NeuroVe$t Journal and is available on selected BBS's.  There is an 
additional file, BDSQUANT.XLS, which contains the small sample quantile 
tables for BDS test.  This file is for reference when conducting BDS analysis.

1. Open file 
Select File/Open or click file open button on tool bar (second from left).  Locate 
and open either 1994.xls or 1994.txt.  You can also select File/New to start with 
a blank worksheet (leftmost button on tool bar).

2(a). Quick charting 
Use arrow keys or mouse to place active cell box anywhere on the second 
column (labeled DJIA at top), click chart button on toolbar (rightmost) or select 
Window/New Chart from the menu.  Everything in the Chart Setup dialog is 
self-explanatory.  The 'Data Arrangement' list box at right shows one choice 
since there is only one way to handle one column of data.  Make your choices 
and click OK.  A chart appears.  You can resize the chart window and the chart 
will resize in the same manner.  You can double-click on the chart to bring up 
the Chart Setup dialog again to change any option except Data Arrangement.

To plot only part of the data in a column, use  the mouse to click and drag 
across the range you want, or you can choose Sheet/Select Range from the menu 
and enter a range in standard spreadsheet format.  For example: enter B2:B101 
to select the first 100 days of 1994 DJIA.  Clicking chart button at this point 
results in a plot of the first 100 values from column B.

2(b). Printing
The print, print setup, and page setup items under File menu all work with the 
current spreadsheet pretty much as one expects.  Print selections only prints the 
selected cells.  You might have to experiment a little to get the output you want.  
My HP 660C DeskJet works fine.  To print a chart, double click on the chart to 
bring up the chart setup dialog.  In the lower right cornet there are check boxes 
for printing to printer and printing to file.  Checking one or both of these and 
clicking <OK> button initiates printing.  The graphics server sends the chart to 
the current default Windows printer and I honestly do not know how to set 
options.  The color output from my printer is quite good.  Your best bet for 
customization is to print to file.  Checking the print to file box brings up a file 
save dialog for saving to a .WMF file.  I apologize for the quirky chart printing 
interface, but I did not want to spend too much time on this.

3. Selecting ranges
Go to the rightmost data columns.  The 3 rightmost columns contain the Hi, Lo, 
and Close of S&P 500 index.  Click and hold down mouse on header of column 
G (SP500H), drag it across to column I and release.  Clicking the header of a 
column or row selects (hi-lights) the whole column or row; clicking and 
dragging selects consecutive columns (rows);  clicking the upper-left most cell 
in the spreadsheet selects the whole sheet.  With columns G, H, and I selected, 
click chart button.  Notice the Data Arrangement options list now gives 3 
choices.  You can designate all columns as y-data, the first column as the x data 
the rest y curves, or arrange the chosen columns as (x1,y1), (x2,y2),...etc.  If odd 
number of columns are chosen, the last column is ignored.  For SP500 Hi, Lo, 
and Close we want the first choice (all y-data).  Click OK to see three colored 
curves on the same plot.

To select two non-adjacent column or non-contiguous ranges: first select 
column/range as usual, then  hold the <Ctrl> key down and select all subsequent 
ranges.  For example: click header of column G to select SP500H, then while 
holding down <Ctrl> key, click the header of column I to select SP500C.  Now 
you can plot these two disjoint columns as before.

4. Calculations
All calculations (under the Calculations and Analysis menu headings) expect to 
work on entire columns.  More precisely the program scans the column starting 
from the top, starts reading data at the first row with a valid number and stops 
reading data as soon as a row without a valid number is encountered.  Thus data 
within a column can have an arbitrary number of spaces and text-filled cells 
above and below it.  But if there are two or more disjointed data groups 
separated by spaces or text cells, only the first (topmost) group will be read in.  
To perform calculations on only part of the column, you will need to copy the 
data of interest to a new column.  To do this, using a mouse, first select the cells 
of interest, choose Edit/Copy from menu, create new empty column somewhere 
(using Sheet/Insert Column from menu) and Edit/Paste or Edit/Paste Values in 
the new column.

5. Single column calculations
These are operations that always work on the column containing the active cell.  
The results are placed in new column(s) to immediate right of the source column 
along with helpful labels.  These labels are input as formulas; therefore, they can 
recalculate themselves if the source column should change header because of 
column deletions/additions to the left.  

Example: under the Calculations menu, 'Statistics' evaluates the data set 
distribution in terms of mean, deviation, skew, and other parameters.  'Delta' 
calculates the daily changes, '% Delta' gives percent change, 'Delta Log' gives 
log10() of (today/yesterday) ratios, 'z-Score' subtracts the mean and divides by 
the standard deviation to give a data set with mean of zero and standard 
deviation of 1.

The Fractional Delta (fractional differencing) is calculated from a binomial 
series expansion which converges very slowly.  I implemented this for looking at 
ARFIMA type analysis.  One can think of the slow convergence as a reflection of 
the long memory of the series involved.  The value at any one point is calculated 
from all previous points in the series in a recursive manner.  It may take up to 
100 points or more to get some semblance of convergence.  Try this: for any 
column of data, do fractional delta with d=0.25 then do d=-0.25 on the result, 
subtract the final result from the original series and plot the difference;  the plot 
clearly shows the error only goes as 1/n in the series calculation.  The dialog box 
will allow a fraction from -0.5 to 0.5; to get some other fraction you can always 
use integer differencing (using Calculations/Delta) to transform the series first.

6. Calculations/Correlation menu item 
Calculates correlation between two series (the first two columns, if more than 
two are chosen) for a range of time lags up to half the time extent of the smaller 
of the two series.  Positive time lags represent a lag of column 1 relative to 
column 2 (data set 1 shifted to the right on x axis).  Negative lags means the 
opposite.  If only one column is chosen, the autocorrelation is calculated, and 
only 0 and positive lags need be presented.  I pad the series with enough zeroes 
to prevent spurious aliasing.

Example 1: Select the DJIA column.  Select 'Calculations/Correlation' menu 
item.  Two adjacent columns are produced to the right.  These represent the 
autocorrelation coefficients for various time lags.  Zero time lag is always 1 
(obviously!).  To see a plot of this, select the Lag and autocorrelation columns 
and click Chart button.  Be sure to choose appropriate Data Arrangement list 
items; in this case both second (X,Y1,Y2...) and third items (X1,Y1,X2,Y2...) 
work the same.  Note the autocorrelation drops off rapidly to near zero at lag of 
20 to 30 days.

Example 2: Select DJIA and then SP500C columns.  Select 
'Calculations/Correlation' from menu and plot the correlation.  As might be 
expected, correlation is near 1 for lag=0 and drops off rapidly.  The curve is 
symmetric about lag=0, again to be expected.  Doing the same for DJBA and 
DJUA gives similar results, this is also logical since utilities tend to move 
closely with bonds.  How about stocks and bonds?  Try DJIA and DJBA.  Note 
correlation is less than 0.4 at lag=0, but a larger magnitude of -0.47 at lag=60.  
The positive lag and negative correlation might imply that DJIA (stocks) tends 
to move in an opposite direction to DJBA (bonds) with a lag of about 60 trading 
sessions (roughly 3 months).  Keep in mind that we are looking at 1994 only, 
these relationships may or may not be the same for other years.

The newly implemented partial autocorrelation (PACF) is actually a by-product 
of a Burg algorithm linear predictive routine hidden in the code.  The routine 
actually produces additional useful information: AR coefficients.  I did not 
present these because I have not decided on appropriate user interface for them 
(maybe next version).  PACF is useful for ARMA-type model identification and 
many books on time series analysis discuss its use.

7(a). The fast Fourier Transform (FFT) works on a single column of real data.  
Therefore, we only need to consider zero and positive frequencies.  The output 
is arranged in two columns, real and imaginary parts of frequencies: 0, 1f , 2f,..., 
Nf/2 where N is the original number of real data points and f = 1/(Nd), d is the 
sampling period (1 day in the example below).  The 'Periodogram' is the power 
spectrum which is simply the sum of squares of real and imaginary parts of  FFT 
output.  For financial data, the power spectrum rarely seems to provide any 
useful information.  I do not pad the series with zeroes for FFT; the series 
lengths need not be integer powers of 2 but does need to be even.  If the series 
length is odd, I add a zero at the end.

If we select the two output columns generated by an FFT and then choose 
Calculations/Inverse FFT from menu, a column of data will be produced that is 
basically identical of that of the original data that was used as input to the FFT 
(applying FFT then inverse FFT does nothing overall).  This means that the 
Inverse FFT function expects to work on the FFT output of a real function.  I 
use the FFT and its inverse to perform filtering.  For example, perform an FFT 
on a data set, take the output columns and set all rows higher than a chosen 
frequency to zero, perform inverse FFT, the result is low-pass filter on the input 
data.  This techniques allows one to filter out fast changes (daily fluctuations) or 
longer term weekly or monthly trends in the frequency domain.

Example 1:  Select the DJIA column.  Select 'Calculations/Z-Score' menu item.  
This shifts the DJIA series to mean=0 and scales it to variance=1.  Select the 
newly created z-score column (should be to immediate right of DJIA).  Select 
'Calculations/FFT'.  Two new columns, Real and Imaginary parts, are created to 
the immediate right.  The top row is frequency=0.  The bottom row is the largest 
frequency which is 1/2 the sampling rate (known as the Nyquist critical 
frequency, in this case 1/(2day)).  If you select these two columns and then 
select 'Calculations/Inverse FFT', you will get back the original DJIA z-score 
data.

Example 2: We can perform low-pass or high-pass filtering.
Low-pass filter: (remove the high frequency components)  Set all cells from row 
21 to the last row (128) in both the real and imaginary columns to 0.  Select 
'Calculations/Inverse FFT'.  Now select and plot DJIA z-score column and the 
newly created inverse FFT column; note the new curve closely matches the 
original curve except it is much smoother.
High pass filter: (remove the low frequency components)   Set rows 2 to 20 to 
zero.  Inverse FFT and compare a plot with the original data.  Note that high 
frequency daily to weekly changes are still preserved but longer trends are 
eliminated.

Note that an easy way to set a large range of cells to 0 is: 1) set the active cell to 
any cell in the range by clicking on the cell, use edit bar to set the content to 0, 
select 'Edit/Copy' (or use toolbar button), select 'Sheet/Select Range' menu item 
and enter desired range (example: c21:d128), select 'Edit/Paste' or use toolbar 
button.  Note that using 'Edit/Clear Range' menu item clears the cells of any 
content, it does not set them to 0.

7(b). The fast Hartley transform (FHT) is similar to FFT in that it is a transform 
from time to frequency domain.  This well know transform from digital signal 
processing (DSP) has the advantage that it work entirely in the real domain 
(maps Rn -> Rn).  Similar to FFT, the forward transform is exactly the same as 
inverse transform except for normalization factor.  For an input series of length 
N (N a power of 2), FHT maps it to N frequencies.  In TimeStat, the first row is 
frequency 0 (constant offset of series), next rows are 1f, 2f, and so on just as 
described for FFT above until Nf/2 (Nyquist critical frequency).  The values from 
(N/2 + 1)f to (N -1)f are actually negative frequencies mapped from -Nf/2 to -f.  
For example to filter out the 3 lowest and 0 frequencies, set rows 1, 2, 3, 4,     
(N-2), (N-1), and N to zero and perform inverse transform.  If you are lost at this 
point, I would not worry too much.  FHT is really not that useful for time series 
analysis anyway...in my humble opinion.  Those who really need it will not even 
need my rambling here.

8. The edit bar works just like an Excel edit bar. Many common functions are 
supported (sin, cos, sum, ln, log10,  sum, stdev, etc).  I'll see about making up a 
list of these functions from Formula One's manual without violating their 
copyrights.

9.  Distribution statistics - the 'Analysis/Cumulative Percentile' menu item is 
a single column operation.  It sorts the column's data and lists the sorted list as a 
function of cumulative percentile in two new adjacent columns.  The distribution 
profile of two different series can be compared using their z-scores.

Example 1: Select the DJIA column.  Select 'Calculations/Z-Score'.  Select the 
newly created DJIA z-score column.  Select 'Analysis/Cumulative Percentile'.  
Two new columns are created with percentile and sorted DJIA z-score values.  
Repeat all of the above using the DJBA.  Now select the 4 columns of percentile 
and sorted z-scores values for DJIA and DJBA and plot them using the 
'X1,Y1,X2,Y2,...' Data Arrangement list option.  Note the two cumulative 
percentile plots look quite different.

Example 2: Continue with the example above.  We can quantify the difference 
between the DJIA and the DJBA.  Select the DJIA column.  Select 
'Analysis/Frequency Histogram' menu item.  Three new columns are produced: 
'Sigma' is deviation from the mean in units of standard deviation, 'Gaussian 
Expect.' Is the expected value for a particular bin for a normal (Gaussian) 
distribution, 'Freq. Hist.' Is the frequency histogram for the values in the series 
sorted into bins according to how far each is from mean.  Select the three 
columns and plot them using first column ('Sigma') as the X value.  The DJIA 
distribution certainly looks pretty close to normal.  How close?  Select  the 
'Gaussian Expect.' and 'Freq. Hist.' Columns. Select 'Analysis/Chi-Square.'  
When asked if the first column is the expectation answer 'yes'.  The chi-square is 
35.7 with probability of 30% that the DJIA distribution is normal.  Doing the 
same for DJBA produces a chi-square of 242.6 with vanishing probability.  
Therefore, DJBA is definitely not normal, which is somewhat obvious from 
looking at a comparison of DJBA distribution plot versus a plot of the normal 
distribution.  We can do a chi-square analysis using the DJIA z-score and DJBA 
z-score directly, in this case, answer 'no' to question about whether first column 
is expectation value.  The result is chi-square > 700 and probability of 0.

Example 3: Quite often, just one or two data points far from the mean can 
drastically alter the chi-square value.  Select the NYVD(000) column (NYSE 
daily volume).  Select 'Analysis/Frequency Histogram' menu item. Select  the 
newly created 'Gaussian Expect.' and 'Freq. Hist.' Columns. Select 
'Analysis/Chi-Square.'  When asked if the first column is the expectation answer 
'yes'.  The chi-square is 146 with vanishing probability.  But note that there is 
one data point at -4 sigma.  Remove that point from the frequency histogram 
(set the cell to the right of -4 sigma to 0), and recalculate the chi-square.  Lo and 
behold the chi-square is down to 29.4 with probability of 60%.  Looking at the 
statistics of NYVD(000), we can see that the -4 sigma point occurred on 
11/25/94, the Friday after Thanksgiving.  So the light volume is not surprising.   
In this case there is a logical explanation for the data point; for a less artificial 
example, select the SP500C column and choose menu item 'Calculations/Delta 
Log'.  Now repeat the Frequency Histogram and Chi-Square analyses on the 
Delta Log of SP500C.  Note the presence or absence of a data point at -3.75 
sigma again makes large difference on the chi-square probability.  The data 
point is in row 26 (2/4/94), the date of first Fed rate hike.  In general time series 
analysis, determining the significance and subsequent treatment of such outliers 
can be a difficult and subtle problem.  Deviations from normality and linearity 
lie at the heart of modern applications of non-linear dynamics and chaos theory 
to financial time series.

10. Principal components analysis (PCA) - this is a powerful and well known 
technique from multivariate statistics.  User should consult any book from that 
field for proper understanding, usage, and interpretations.  This technique 
'rotates' the sample data sets in multi-dimensional space to new axes of 
maximum variances.  It is useful for reducing a large number of variables to 
fewer number of linear combinations (the principal components) which contains 
nearly the same variance (which may be interpreted as information) as the 
original variable.  With fewer input variables, neural networks would be more 
efficient and might train faster with better results.  As always, this is a tool, not 
black magic, YMMV (your mileage may vary).

Example 1: Open file '1994.xls'; selected columns B to G inclusive (DJIA to 
SP500H).  Select menu Analysis/Principal Components Analysis.  In the pop-up 
dialogue note that you can perform PCA using covariance or correlation matrix.  
The default is correlation matrix because it normalizes input variables.  Analysis 
using covariance matrix will give heavier weights to variables of higher 
variances.  This may or may not be desirable;  stick with default correlation 
matrix unless you know what you are doing.  Clicking the 'Calculate' button 
causes the program to calculate the matrix, diagonalize it and list eigenvalues.  
The eigenvalues, in descending order, are the resulting variances of the principal 
components.  Note that the first three out of six of the eigenvalues already 
captured 97% of the all variance (or variations) in the 6 original inputs.  The 
check boxes group in the lower left part of the dialog box lists the output 
options; the default is to output the correlation (or covariance) matrix and its 
eigenvalues and eigenvectors.  The other two boxes give options to output 
selected principal components series and one or more of the original variables as 
reconstructed from the selected principal components.  In the 'Principal 
Components Values' list box in upper right, select the third item (0.93777 
(15.6%)), indicating we want to keep and work with the first 3 principal 
components (containing 97% of total variance).  In the 'Reconstruction Series 
Selection' list box in lower right, select B and G, indicating we want to 
reconstruct these columns (DJIA and SP500H) from the first 3 principal 
components (make selection by clicking on B then on G, no need to hold down 
<SHIFT> key; click on a selection again to un-select it).  Make sure that all 
output options boxes in lower left are checked and then click 'OK' button to see 
the result.  Looking at the output symmetric matrix, we can see that 3 
components are able to capture most of the variability in the 6 input series.  The 
correlation coefficients are very high (> 0.9) for (DJTA, DJBA, and DJUA), and 
also for (DJIA and SP500H), implying 2 series from the first group and 1 from 
the second may be providing similar information.  None of this is surprising 
except perhaps the correlation between DJTA and DJBA (or DJUA).  To the 
right of the matrix are the selected principal components series (unnormalized), 
and the selected reconstructed series.  Select column B (DJIA) and then scroll 
right and select column R ("B2:B253 PCA Reconst.") while holding the Ctrl 
key.  Click chart button to compare the two series.  As you can see, the 
reconstruction is not too bad over the entire range.  Try the same for columns C 
and S (SP500H and its reconstruction).  PCA does not always give clear cut 
results and interpretations; as with any other tools, employ it with a healthy dose 
of common sense.

Some remarks about the PCA output: for the analysis using the correlation 
matrix as above, the input data series are normalized to mean of 0 and variance 
of 1.  The output PCAn (n=1,2,...) series are linear combinations of the 
normalized input (with the coefficients given by the corresponding eigenvector),  
they must have mean = 0 by definition, but the variance is not normalized to 1.  
In fact, the variance of the PCAn series is just the nth eigenvalue of the 
correlation matrix.  The reconstructions of input series are linear combinations 
of the selected PCA series, with coefficients given by the appropriate 
components of the eigenvectors.  In our example above, column B 
reconstruction (the 1st input variable) is a linear combination of columns PCA1, 
PCA2, and PCA3 with coefficients given by the first components of the first 3 
correlation matrix eigenvectors (in this case, located in I18, J18, and K18 cells).  
The resulting series is then multiplied by the variance and added the mean of the 
original input series for direct comparison.  In a similar manner, the G column 
reconstruction uses the 6th components of the first 3 eigenvectors (cells I23, 
J23, and K23).  Note that if we had chosen to use all principal components, 
generating PCA1 through to PCA6, the reconstruction would be perfect--the 
reconstructed series would be exact copies of original input series since the 
operations performed amounted to multiplying input vector by a 6x6 matrix and 
then by its inverse, resulting in an identity operation.

All of the above linear combinations for PCA and reconstruction series are 
conveniently expressed as proper spreadsheet formulae in TimeStat.  Move the 
active cell to a number under a PCAn column or a reconstructed series column 
and check the edit bar above the spreadsheet to see the formula.  This is useful 
for quickly calculating the PCAn series for fresh data in the original input series.  
Simply place the new data at the bottom of each input series column, then copy 
down the formulae for each PCAn column into the new rows.  The spreadsheet 
will take care of the rest.

11. Discrete wavelet transform (DWT) - the wavelet transform has been a hot 
topic in math, science, engineering, and more recently, economics and finance.  
Broad references are generally abundant and easy to find; references on 
applications in economics and finance are few right now but increasing in 
number.  I cannot possibly do justice to its richness and complexity here or in 
the program.  My implementation does allow you to play with the 1-dimensional 
DWT and hopefully gain some understanding through actual examples.  The 
algorithm itself is actually much simpler than the mathematical concepts in my 
opinion.  The actual DWT code in C is probably no more than 100 lines.  I 
adapted this version from Bob Lewis' Imager Wavelet Library (see 
README.TXT), so some of the bases implemented are probably more suitable 
for image processing.  Basically, wavelets allow simultaneous decomposition of 
a time series into components (bases) which are localized in both time and 
frequency.  This is unlike FFT where the component sine and cosine waves are 
localized in frequency but totally unlocalized in time.

Example: With file '1994.xls' loaded, select column B (DJIA), select menu item 
Calculations/Z-Score to normalize the series to zero mean and variance one 
(this is not necessary for DWT but it makes the charting below easier).  Select 
the newly created column C (the Z-score); select menu item Analysis/Discrete 
Wavelet Transform.  The DWT dialog allows you to choose from a selection of 
bases and perform thresholding on the coefficients.  For now leave the threshold 
slide bar at 0% (no thresholding), choose the Daubechies 4 basis, check the 
forward transform box (default) and check the 'Freq. Band Decomposition' box.  
Click 'OK'.  On the spreadsheet you now have a new column of Daub4 DWT 
coefficients arranged in the standard manner: rows 130 to 257 represent the 
upper frequency band detail (roughly Fc/2 to Fc, where Fc is the Nyquist critical 
frequency or 1/2 the sampling rate of the input series.  In this case, the sampling 
rate is 1/trading day so Fc is 1/(2 trading days)).  Rows 66 to 129 represent the 
next band (Fc/4 to Fc/2), and so on until the first coefficient represents the 
scaling function.  This arrangement comes from the multiresolution analysis 
(MRA) of wavelet transform; look up any introduction to wavelets for a 
discussion.  The next 7 columns represent the spectral components of the input 
series.  Summing the 7 columns row by row would reproduce the original input 
series.  As the column titles indicate, each column represents the 'information 
content' of the frequency bands as described above.  At this point one can throw 
away the high (or the low) frequency bands similar to filtering using FFTs.  
One recently proposed technique involves using separate neural networks to 
forecast each individual decomposed series and recombine the results to obtain a 
forecast for the original series.  Select column D (Daub4 DWT) and select menu 
item Analysis/Discrete Wavelet Transform.  Uncheck the Forward/Reverse 
check box (to get the reverse transform).  Clicking 'OK' at this point would just 
give us back the original input series (Z-score of DJIA).  Move the 'Quantile 
threshold' slide bar to 75%, make sure the basis option is set for 'Daubechies 4' 
and click 'OK'.  The program now sets those coefficients that represent the 
smallest 75% in magnitude to zero, and then performs the inverse transform.  
Select columns C and E and chart them.  Note that, with only 25% of the 
wavelet coefficients we get a surprisingly good reproduction of the original 
series.  Note also that this 'approximation' is different from the usual smoothing 
or averaging in that small rapid oscillations are eliminated but all sharp turns or 
significant magnitude are faithfully captured with great accuracy.  This is one of 
the advantages of wavelets over Fourier transform; you would need large 
number of FFT coefficients to accurately reproduce large,  rapid, and isolated 
variations while it takes only a few compact wavelets to do the same job.

Some final comments about the DWT.  To actually see what the wavelet bases 
look like, first inverse transform a series with only one non-zero coefficient.  
For example, fill an empty column with zeroes from row 1 to row 256, set row 
12 to = 1, inverse transform this column with a basis of your choice, and then 
chart the output series to see what the wavelet looks like.  What is the downside 
to using the DWT?  Like the FFT, the DWT suffers from aliasing effects at the 
ends.  The algorithm assumes the series are periodic just as the FFT.  Padding 
with zeroes alleviates but does not solve the problem.  For forecasting and 
financial time series analysis, the ending points of the series are precisely where 
the most important informations may lie.  The solution to this problem is to use 
orthonormal bases which 'live' in finite intervals.  I hope to implement such 
wavelets on the interval sometime.  Which basis should you use?  There is no 
'wrong' basis--any basis will transform any data series.  You can choose a
'best' basis based on the data pattern.  For example, I used Daub4 basis 
above because these wavelets has sharp corners (mathematically--discontinuities 
in the first derivative) which happen to suit the stock data well.  If I had 
used smoother wavelets (Daub8 or Daub16 for example), the inverse transform
with 75% thresholding would show rounded corners.  There are more systematic
ways to understand and choose bases, but that is beyond our scope here.

12. Moving Averages (MA) - under the 'Calculations' menu, there are 3 types
of moving averages: simple (SMA), exponential (XMA), and adaptive (AMA).  
Choosing any of these would produce a dialog box prompting you to enter an 
MA window length, n; the positive integer entered should be >1 and <=500.  
SMA is just that, an average of past n days.  XMA gives more weight to more
recent days so that sharp features will not have excessive influence far
into the future.  AMA attempts to reduce the 'lag' inherent in all moving
averages (see NeuroVe$t July '95 issue).  AMA involves subtracting a multiple
of previous XMA from current XMA and smoothing the result with XMA.
I note here that in the June '95 issue of Technical Analysis of Stocks and 
Commodities, the interview with Perry Kaufman also presented an 'Adaptive 
Moving Average'.  It is different from (and more complicated than) what we 
have here.  Those interested can of course look it up, enter the relevant 
Excel formulae and compare.

13. BDS test - First let me cover my behind: I do not pretend to even begin to 
gain a grasp of the vast field of non-linear dynamics, chaos, and their 
applications in time series.  BDS test is a widely used technique for detecting 
non-linearity (or more precisely: deterministic structure) in time series.  I have 
baiscally taken the C code provided by Prof. Blake LeBaron of University of 
Wisconsin, hacked and compiled it into TimeStat, adding only some bare user 
interface for input parameters.  It is impossible to use BDS statistics without 
some knowledge of the theory and the background.  Unfortunately both are way 
beyond the scope of these notes.  BDS tests the null hypothesis that the input 
series is independent.  The analysis is NOT going to spit out a yes or no 
answer (more like maybe yes, perhap no, good chance that...).  To generate 
the BDS statistics, select a column and choose Analysis/BDS menu item.  A 
dialog box appears for user to enter two parameter: real parameter epsilon 
expressed in terms of % of standard deviation of the series, and an integer 
parameter, m.  Epsilon is the nearest neighbor cutoff for the correlation integral 
calculation (two points in space are neighbors if they are within this distance of 
each other).  m is the maximum embedding dimension to be considered.  A time 
series {x1, x2, x3, ..., xn} is embedded into spatial dimension m by forming the 
m-tuples {(x1,x2,...,xm), (x2,x3,...,x(m+1)),...}.  So embedding in m=2 would 
consist of forming ordered pairs {(x1,x2), (x2,x3), (x3,x4),...} in 2d space.  The 
correlation integral measures the number of pairs of embedded points within 
distance Epsilon of each other.  Truly independent series correlation integral 
has a definite asymptotic behavior, giving us the null hypothesis for BDS 
analysis.  For a given m entered, the program will calculate BDS statistics for 
embedding dimensions 2, 3,..., m.  Clicking <OK> button initiate the 
calculation, which can be lengthy depending on the series data size and on 
embedding dimension; on my 486/66 under Windows NT, a 1000-point series 
for m=5 took about 40 seconds, 2.5 minutes for m=10.  In Linux and 32-bit NT, 
and using the faster algorithm, the times are down to seconds.  The results are 
presented in a new column to the right of the series.  The new column first lists 
the series standard deviation and then the actual value of epsilon used (if 100% 
was entered, than epsilon = std. Dev.), followed by the integer m and finally (m-
1) real numbers representing the BDS statistics calculated for embedding 
dimensions 2, 3,..., m in order.  That is it for the easy part; the much harder part 
is the proper interpretation of the statistics.  The statistics generated for each 
embedding dimension should be asymtotically (meaning large number of data 
points) Gaussian normal of mean 0 and standard deviation 1 for a truly 
independent series (random walk).

Example: in any column generate 1000 random numbers using the built-in 
rand() function (type =rand() in a cell and copy to others).  Select the column 
and select Analysis/BDS Statistics menu item.  Accept the default values in the 
pop-up dialog.  The hourglass cursor will be around from 20 to 50 seconds and 
then the results will be presented in the column to the right.  The result for all 
embedding dimensions should be within -1 to +1, indicating passage of the null 
hypothesis (independent series).  Copy the entire column and do a Edit/Paste 
Values to copy the random numbers (but not the formula) to a new column.  Sort 
the new column by Sheet/Sort (accept default settings).  Run BDS again on the 
new column.  You should get some large numbers far from norm, indicating 
rejection of null hypothesis (series not independent).

The deceptively simple procedure above hides the complicated background, 
theory, calculations, and interpretations involved in BDS analysis.  To begin 
with, non-linear/chaos studies and analysis really need large amount of data just 
to get started most of the time.  BDS test will spit out some result for data series 
as short as 100 points but the interpretation is much trickier and most likely 
suspect.  The authors of the reference below conducted extensive Monte Carlo 
tests to study the small sample behavior of BDS.  Results indicate that for small 
samples the BDS statistics are typically not Gaussian normal.  They provided 
some tables as guidelines for interpreting small samples.  I reproduced the 
relevant tables in the included spreadsheet BDSQUANT.XLS.  The tables are 
provided for 100, 250, and 500 datapoints for embedding dimensions 2, 3, 4, 5, 
and sometimes 10.  Each table also has a column of standard normal quantiles 
for comparison.  Generally, for N data points and maximum embedding m, one 
can used the Gaussian distribution if N/m > 200.

Example: Open 1994.XLS, select first column (DJIA) and run BDS.  The series 
is 253 points long so we will use the 250-point table.  The calculated BDS 
statistics are very large numbers, in fact well off the tabulated values.  The 
implication is that DJIA seris is not independent.  No real big surprise, we could 
have reached the same conclusion by just looking at the chart.  Difference the 
series once (Delta) and do BDS again.  This time the BDS statistics are very 
small numbers (.57, .51, 1.52, 2.35 for m=2,3,4,5).  Looking at second part of 
Table 2 in BDSQUANT.XLS, we can see that these statistics are within the 10-
90% quantile, with the exception of m=5, which is not too far off.  The 
interpretation here might then be that the once differenced DJIA series has very 
little forecastable structure (recall how many traders got whipsawed badly that 
year?).  Try the same thing for once-differenced DJBA.  This time the statistics 
are all beyond 97.5% quantile,  allowing us to reject with some confidence the 
null hypothesis of independence (you will not catch me saying something like 
'tradeable').

The examples I gave above almost certainly overly simplified and trivializd a 
complex subject and a powerful tool.  Beyond the caveats, here are some final 
remarks:
- for series of lengths < 500, do not use m>7.  The calculations will likely run 
into underflow problems because there are simply too few non-overlapping 
points m-dimensional space.
- for those studs in numerical/statistical methods, proper use of bootstrapping 
can get better results for short series
- BDS is also well-suited for testing for remaining structures in the residuals of 
time series forecasting models (ARMA, ARCH, GARCH, NN, and so on).
- anyone serious about using this should read the reference given below and look 
at the source code


Reference: Nonlinear Dynamics, Chaos, and Instability by Brock, Hsieh, and 
LeBaron, MIT Press 1991.

