Newsgroups: comp.sources.misc
From: Donald Tveter <drt@chinet.chi.il.us>
Subject:  v22i073:  back-prop - Fast Backpropagation, Part01/04
Message-ID: <csm-v22i073=back-prop.220929@sparky.imd.sterling.com>
X-Md4-Signature: b2d7e2bed3d9aa8488e8622e25962850
Date: Wed, 28 Aug 1991 03:20:00 GMT
Approved: kent@sparky.imd.sterling.com

Submitted-by: Donald Tveter <drt@chinet.chi.il.us>
Posting-number: Volume 22, Issue 73
Archive-name: back-prop/part01
Environment: UNIX
Supersedes: back-prop: Volume 14, Issue 84-87

   The programs described below were produced for my own  use  in
studying  back-propagation  and  for doing some examples that are
found in my introduction to Artificial Intelligence textbook, The
Basis  of  Artificial  Intelligence,  to be published by Computer
Science Press.   I  have  copyrighted  these files  but I  hereby  
give  permission  to   anyone  to  use  them for experimentation, 
educational  purposes  or  to  redistribute  them  on a  not  for
profit  basis.   All  others  that want to use these programs for
business or commercial purposes, please contact me at the address
in the README for permission.

   There are four simulators that can be constructed from the in-
cluded files.  The program, rbp, does back-propagation using real
weights and arithmetic.  The program, bp,  does  back-propagation
using  16-bit  integer  weights, 16 and 32-bit integer arithmetic
and some floating point arithmetic.  The program, sbp,  uses  16-
bit integer symmetric weights but only allows two-layer networks.
The program srbp does the same using real weights.   The  purpose
of  sbp and srbp is to produce networks that can be used with the
Boltzman machine relaxation algorithm (not included).

See the file README and Changes file for more information.

Dr. Donald R. Tveter
5228 N. Nashville Ave.
Chicago, Illinois   60656
USENET:  drt@chinet.chi.il.us
-----
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of archive 1 (of 4)."
# Contents:  README
# Wrapped by drt@chinet on Tue Aug 27 09:06:19 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'README' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'README'\"
else
echo shar: Extracting \"'README'\" \(43266 characters\)
sed "s/^X//" >'README' <<'END_OF_FILE'
X.ce
XFast Back-Propagation
X.ce
XCopyright (c) 1991 by Donald R. Tveter
X
X
X.ce
X1. Introduction
X
X   The programs described below were produced for my own use in studying
Xback-propagation and for doing some examples that are found in my
Xintroduction to Artificial Intelligence textbook, \fIThe Basis of
XArtificial Intelligence\fR, to be published by Computer Science Press.
X(One would hope some time before the sun burns out or before the Cubs
Xwin the World Series or before Congress balances the budget or ... .)
XI have copyrighted these files but I hereby give permission to
Xanyone to use them for experimentation, educational purposes or to
Xredistribute them on a not for profit basis.  All others that want to
Xuse these programs for business or commercial purposes, I will charge
Xyou a small fee.  You should contact me by mail at:
X
X.na
X.nf
XDr. Donald R. Tveter
X5228 N. Nashville Ave.
XChicago, Illinois   60656
XUSENET:  drt@chinet.chi.il.us
X.ad
X.fi
X
XAlso, I would be interested in hearing from anyone with suggestions,
Xbug reports and major successes or failures.
X
X   Note:  this is use at your own risk software:  there is no guarantee
Xthat it is bug-free.  Use of this software constitutes acceptance for
Xuse in an as is condition.  There are no warranties with regard to this
Xsoftware. In no event shall the author be liable for any damages
Xwhatsoever arising out of or in connection with the use or performance
Xof this software.
X
X   There is also a good chance that I will soon produce and sell DOS and
Xextended DOS binary versions with more capabilities than there are in
Xthis UNIX (tm of AT&T Bell Labs) source code version.  They will
Xprobably include Quickprop, SuperSAB and Cascade Correlation at the very
Xleast and perhaps also DSM, LVQ1 and counter-propagation as well.
X
X   There are four simulators that can be constructed from the included
Xfiles.  The program, rbp, does back-propagation using real weights and
Xarithmetic.  The program, bp, does back-propagation using 16-bit integer
Xweights, 16 and 32-bit integer arithmetic and some floating point
Xarithmetic.  The program, sbp, uses 16-bit integer symmetric weights but
Xonly allows two-layer networks.  The program srbp does the same using
Xreal weights.  The purpose of sbp and srbp is to produce networks that
Xcan be used with the Boltzman machine relaxation algorithm (not
Xincluded).
X
X   In most cases, the 16-bit integer programs are the most useful,
Xbecause they are the fastest.  Unfortunately, sometimes 16-bit
Xinteger weights don't have enough range or precision and then using the
Xfloating point versions may be necessary.  Many other speed-up
Xtechniques are included in these programs.
X
X.ce
XUpdated Version
X
X   This version contains a number of improvements and changes to command
Xnames vs. the September 1990 version.  For a summary of changes and
Xadditions see the file changes.
X
X.ce
X2. Making the Simulators
X
X   This code has been written to use either 32-bit floating point
X(float) or 64-bit floating point (double) arithmetic.  On System V
Xmachines the standard seems to be that all floating point arithmetic is
Xdone with double precision arithmetic so double arithmetic is faster
Xthan float so this is the default.  Other versions of C (e.g. ANSI C)
Xwill do single precision real arithmetic.  To get 32-bit floating point
Xset the compiler flag FLOAT in the makefile.  The function, exp, defined
Xin real.c is double since System V specifies it as double.  If your C
Xuses float, change this definition as well.
X
X   One option exists for bp and sbp.  If your compiler isn't smart
Xenough to divide by 1024 by shifting, remove the SMART flag in the
Xmakefile.
X
X   To make a particular executable file, use the makefile given
Xwith the data files and make any or all of them like so:
X
X.ce
Xmake bp
X.ce
X make sbp
X.ce
X make rbp
X.ce
X  make srbp
X
X   To make a record of all the input and output from the programs,
Xthe following small UNIX command file I call "record" can be used:
X
X.na
X.nf
Xtrap "" 2
Xoutfile="${1}.record"
Xif test -f $outfile 
X   then
X      rm $outfile
X   fi
Xecho $outfile
X(tee -a $outfile | $*) | tee -a $outfile
Xprocess=`ps | grep tee | cut -c1-6`
Xkill $process
X.ad
X.fi
X
XFor example to make a record of all the input and output from the
Xprogram bp using data file, xor, use:
X
X.ce
Xrecord bp xor
X
X.ce
X3. A Simple Example
X
X  Each version would normally be called with the name of a file to read
Xcommands from, as in:
X
X.ce
Xbp xor
X
XWhen no file name is specified, bp will take commands from the keyboard
X(UNIX stdin file).  After the data file is read commands are then taken
Xfrom the keyboard.
X
X   The commands are one letter commands and most of them have optional
Xparameters.  The `A', `B', `d' and `f' commands allow a number of
Xsub-commands on a line.  The maximum length of any line is 256
Xcharacters.  An `*' is a comment and it can be used to make the
Xremainder of the line a comment.  Here is an example of a data file to
Xdo the xor problem:
X           
X.na
X.nf
X* input file for the xor problem
X           
Xm 2 1 1           * make a 2-1-1 network
Xc 1 1 3 1         * add this extra connection
Xc 1 2 3 1         * add this extra connection
Xs 7               * seed the random number function
Xk 0 1             * give the network random weights
X
Xn 4               * read four new patterns into memory
X1 0 1
X0 0 0
X0 1 1
X1 1 0
X
Xe 0.5             * set eta to 0.5 (and eta2 to 0.5)
Xa 0.9             * set alpha to 0.9
X.ad
X.fi
X
XFirst, in this example, the m command will make a network with 2 units
Xin the input layer, 1 unit in the second layer and 1 unit in the third
Xlayer.  The following c commands create extra connections from layer 1
Xunit 1 to layer 3 unit 1 and from layer 1 unit 2 to layer 3 unit 1.
XThe `s' command sets the seed for the random number function.  The `k'
Xcommand then gives the network random weights.  The `k' command has
Xanother use as well.  It can be used to try to kick a network out of a
Xlocal minimum.  Here, the meaning of "k 0 1" is to examine all the
Xweights in the network and for every weight equal to 0 (and they all
Xstart out at 0), add in a random number between -1 and +1.  The `n'
Xcommand specifies four new patterns to be read into memory.  With
Xthe `n' command, any old patterns that may have been present are
Xremoved.  There is also an `x' command that behaves like the `n'
Xcommand, except the `x' commands \fIadds\fR the extra patterns to the
Xcurrent training set.  The input pattern comes first, followed by the
Xoutput pattern.  The statement, e 0.5, sets eta, the learning rate for
Xthe upper layer to 0.5 and eta2 for the lower layers to 0.5 as well.
XThe last line sets alpha, the momentum parameter, to 0.9.
X
X   After these commands are executed, the following messages and prompt
Xappears:
X
X.na
X.nf
X.ne3
XFast Back-Propagation Copyright (c) 1991 by Donald R. Tveter
Xtaking commands from stdin now
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]?
X.ad
X.fi
X
XThe characters within the square brackets are a list of the possible
Xcommands.  To run 100 iterations of back-propagation and print out the
Xstatus of the learning every 20 iterations type "r 100 20" at the
Xprompt:
X
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? r 100 20
X
XThis gives:
X 
X.na
X.nf
Xrunning . . .
X   20 iterations    0.00% right (0 right   4 wrong)   0.49927 error/unit
X   40 iterations    0.00% right (0 right   4 wrong)   0.43188 error/unit
X   60 iterations   75.00% right (3 right   1 wrong)   0.09033 error/unit
X   62 iterations  100.00% right (4 right   0 wrong)   0.07129 error/unit
Xpatterns learned to within 0.10 at iteration^G 62
X.ad
X.fi
X
XThe program immediately prints out the "running . . ." message.  After
Xeach 20 iterations, a summary of the learning process is printed, giving
Xthe percentage of patterns that are right, the number right and wrong
Xand the average value of the absolute values of the errors of the output
Xunits.  The program stops when the each output for each pattern has been
Xlearned to within the required tolerance, in this case the default value
Xof 0.1.  A ctrl-G is normally printed out as well to sound the bell.  If
Xthe second number defining how often to print out a summary is omitted,
Xthe summaries will not be printed.  Sometimes the integer versions will
Xdo a few extra iterations before declaring the problem done because of
Xtruncation errors in the arithmetic done to check for convergence.  The
Xstatus figures for iteration i are computed when making the forward pass
Xof the iteration and before the weights are updated so these values are
Xone iteration out of date.  This saves on CPU time, however, if you
Xreally need up-do-date statistics use the u+ option described in the
Xformat specifications.
X
X.ce
XListing Patterns
X
X   To get a listing of the status of each pattern, use the `P' command
Xto give:
X
X.na
X.nf
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? P
X  1  0.90  (0.098) ok
X  2  0.05  (0.052) ok
X  3  0.94  (0.062) ok
X  4  0.07  (0.074) ok
X   62 iterations  100.00% right (4 right   0 wrong)   0.07129 error/unit
X.ad
X.fi
X
XThe numbers in parentheses give the sum of the absolute values of the
Xoutput errors for each pattern.  An `ok' is given to every pattern that
Xhas been learned to within the required tolerance.  To get the status of
Xone pattern, say, the fourth pattern, type "P 4" to give:
X
X 0.07  (0.074) ok
X
XTo get a summary without the complete listing, use "P 0".  To get the
Xoutput targets for a given pattern, say pattern 3, use "O 3".
X
X   A particular test pattern can be input to the network with the `p'
Xcommand, as in:
X
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? p 1 0
X 0.90 
X
X.ce
XExamining Weights
X
X   It is often interesting to see the values of some particular weights
Xin the network.  To see a listing of all the weights in a network, use
Xthe save weights command described below and then cat the file weights,
Xhowever, to see the weights leading into a particular node, say
Xthe node in row 2, node 1 use the w command as in:
X
X.na
X.nf
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? w 2 1
Xlayer unit  unit value     weight         input from unit
X  1      1    1.00000     9.53516             9.53516
X  1      2    0.00000    -8.40332             0.00000
X  2      t    1.00000     4.13086             4.13086
X                                      sum =  13.66602
X.ad
X.fi
X
XThis listing also gives data on how the current activation value of
Xthe node is computed using the weights and the activations values of
Xthe nodes feeding into unit 1 of layer 2.  The `t' unit is the
Xthreshold unit.
X
X.ce
XThe Help Command
X
X   To get a short description of any command, type `h' followed by
Xthe letter of the command.  Here, we type h h for help with help:
X
X.na
X.nf
X.ne3
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? h h
X
Xh <letter> gives help for command <letter>.
X.ad
X.fi
X
XTo list the status of all the parameters in the program, use `?'.
X
X.ce
XTo Quit the Program
X
X   Finally, to end the program, the `q' (for quit) command is entered:
X
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? q
X
X.ce
X4. The Format Command
X
X   There are several ways to input and output patterns, numbers and
Xother values and there is one format command, `f', that is used to set
Xthese options.  In the format command, a number of options can be
Xspecified on a single line, as for example in:
X
X.ce
Xf b+ ir oc s- wB
X
X.ce
XInput Patterns
X
X   The programs are able to read pattern values in two different
Xformats.  The default input format is the compressed format.  In it,
Xeach value is one character and it is not necessary to have blanks
Xbetween the characters.  For example, in compressed format, the patterns
Xfor xor could be written out in either of the following ways:
X
X.ce
X101               10 1
X.ce
X000               00 0
X.ce
X011               01 1
X.ce
X110               11 0
X
XThe second example is preferable because it makes it easier to see the
Xinput and the output patterns.  Compressed format can also be used to
Xinput patterns with the `p' command.  In addition to using 1 and 0 as
Xinput, the character, `?' can be used.  This character is initially
Xdefined to be 0.5, but it can be redefined using the Q command like so:
X
X.ce
XQ -1
X
XThis sets the value of ? to -1.  Other valid input characters are the
Xletters, `h', `i', `j' and `k'.  The `h' stands for `hidden'.  Its
Xmeaning in an input string is that the value at this point in the string
Xshould be taken from the next unit in the second layer of the network.
XThis notation is useful for specifying simple recurrent
Xnetworks.  Naturally, `i', `j' and `k' stand for taking input
Xvalues from the third, fourth and fifth layers (if they exist).  A
Xsimple example of a recurrent network is given later.
X
X   The other input format for numbers is real.  The number portion must
Xstart with a digit (.35 is not allowed, but 0.35 is).  Exponential
Xnotation is not allowed.  Real numbers have to be separated by a space.
XThe `h', `i', `j', `k' and `?' characters are also allowed with real
Xinput patterns.  To take input in the real format, it is necessary
Xto set the input format to be real using the `f' (format) command as in:
X
X.ce
Xf ir
X
XTo change back to the compressed format, use:
X
X.ce
Xf ic
X
X.ce
XOutput of Patterns
X
X   Output format is controlled with the `f' command as in:
X
X.ce
Xf or
X.ce
Xf oc
X.ce
Xf oa
X
XThe first sets the output to real numbers.  The second sets the
Xoutput to be compressed mode where the value printed will be a `1' when
Xthe unit value is greater than 1.0 - tolerance, a `^' when the value
Xis above 0.5 but less than 1.0 - tolerance, a `v' when the value is
Xless than 0.5 but greater than the tolerance.  Below the tolerance
Xvalue, a `0' is printed.  The tolerance can be changed using the `t'
Xcommand (not a part of the format command).  For example, to make all
Xvalues greater than 0.8 print as `1' and all values less than 0.2 print
Xas `0', use:
X
X.ce
Xt 0.2
X
XOf course, this same tolerance value is also used to check to see if all
Xthe patterns have converged.  The third output format is meant to
Xgive "analog compressed" output.  In this format, a `c' is printed when
Xa value is close enough to its target value.  Otherwise, if the answer
Xis close to 1, a `1' is printed, if the answer is close to 0, a `0' is
Xprinted, if the answer is above the target but not close to 1, a `^' is
Xprinted and if the answer is below the target but not close to 0, a `v'
Xis printed.  This output format is designed for problems where the
Xoutput is a real number, as for instance, when the problem is to make a
Xnetwork learn sin(x).
X
X   For the sake of convenience, the output format (and only the
Xoutput format) can be set without using the `f', so that:
X
X.ce
Xor
X
Xwill also make the output format real.
X
X.ce
XBreaking up the Output Values
X
X   In the compressed formats, the default is to print a blank after
Xevery 10 values.  This can be altered using the `b' (for inserting
Xbreaks) command.  The use for this command is to separate output values
Xinto logical groups to make the output more readable.  For instance, you
Xmay have 24 output units where it makes sense to insert blanks after the
X4th, 7th and 19th positions.  To do this, specify:
X
X.ce
Xb 4 7 19
X
XThen for example, the output will look like:
X
X.na
X.nf
X  1 10^0 10^ ^000v00000v0 01000 (0.17577)
X  2 1010 01v 0^0000v00000 ^1000 (0.16341)
X  3 0101 10^ 00^00v00000v 00001 (0.16887)
X  4 0100 0^0 000^00000v00 00^00 (0.19880)
X.ad
X.fi
X
XThe `b' command allows up to 20 break positions to be specified.  The
Xdefault output format is the real format with 10 numbers per line.  For
Xthe output of real values, the `b' command specifies when to print a
Xcarriage return rather than when to print a blank.  (Note:  the `b'
Xcommand is not part of the `f' command.)
X
X.ce
XPattern Formats
X
X   There are two different types of problems that back-propagation can
Xhandle, the general type of problem where every output unit can take on
Xan arbitrary value and the classification type of problem where the goal
Xis to turn on output unit i and turn off all the other output units when
Xthe pattern is of class i.  The xor problem is an example of the general
Xtype of problem.  For an example of a classification problem, suppose
Xyou have a number of data points scattered about through two-dimensional
Xspace and you have to classify the points as either class 1, class 2 or
Xclass 3.  For a pattern of class 1 you can always set up the
Xoutput: "1 0 0", for class 2: "0 1 0" and for class 3: "0 0 1",
Xhowever doing the translation to bit patterns can be annoying, so
Xanother notation can be used.  Instead of specifying the bit patterns
Xyou can set the pattern format option to classification (as opposed to
Xthe default value of general) like so:
X
X.ce
Xf pc
X
Xand then the program will read data in the form:
X
X   1.33   3.61   1   *  shorthand for 1 0 0
X   0.42  -2.30   2   *  shorthand for 0 1 0
X  -0.31   4.30   3   *  shorthand for 0 0 1
X
Xand translate it to the bit string form when the pattern is loaded
Xonto the output units.  To switch to the general form, use "f pg".
X
X   In addition to controlling the input of data, the p command within
Xthe format command is used to control the output of patterns from a set
Xof test patterns kept on a file.  If the format is either c or g then
Xwhen the test set is run thru the network you will only get a summary
Xof how many patterns are correct.  If the format is either C or G you
Xwill get a listing of the output values for each pattern as well as
Xthe summary.  When reading patterns, C works the same as c and G works
Xthe same as g.
X
X.ce
XControlling Summaries
X
X   When the program is learning patterns you can have it print out
Xthe status of the learning process at regular intervals.  The default
Xis to print out only a summary of how learning is going, however you
Xcan also print out the status of every pattern at regular intervals.
XTo get the whole set of patterns, use "f s-" to turn off the summary
Xformat and "f s+" to go back to summarizing.
X
X.ce
XRinging the Bell
X
X   To ring the bell when the learning has been completed use "f b+"
Xand to turn off the bell, use "f b-".
X
X.ce
XEchoing Input
X
X   When you are reading commands from a file, it is often worthwhile
Xto see those commands echoed on the screen.  To do this, use "f e+"
Xand to turn off the echoing, use "f e-".
X
X.ce
XUp-To-Date Statistics
X
X   During the ith pass thru the network the program will collect
Xstatistics on how many patterns are correct and how much error there
Xis overall.  These numbers are gathered before the weights are updated
Xand so the results listed for iteration i really show the situation
Xafter the weight update for iteration i-1.  To complicate matters more
Xthe weight updates can be done continuously instead of after all the
Xpatterns have been presented so the statistics you get here are skewed
Xeven more.  If you want to have up-to-date statistics with either
Xmethod, use "f u+" and to go back to statistics that are out of date,
Xuse "f u-".  The penalty with "f u+" is that the program needs to do
Xanother forward pass.  When using the continuous update method it is
Xhighly advisable to use "f u+", at least when you get close to
Xcomplete convergence because the default method of checking may claim
Xthe learning is finished when it isn't or it may continue training
Xafter the tolerances have been met.
X
X.ce
X 5. Taking Training and Testing Patterns from Files
X
X   In the xor example given above the four patterns were part of the
Xdata file and to read them in the following lines were used:
X
X.na
X.nf
Xn 4
X1 0 1
X0 0 0
X0 1 1
X1 1 0
X.ad
X.fi
X
XHowever, it is also convenient to take patterns from a file that
Xcontains nothing but a list of patterns (and possibly comments).
XTo read a new set of patterns from some file, patterns, use:
X
X.ce
Xn f patterns
X
XTo add an extra bunch of patterns to the current set you can use:
X
X.ce
Xx f patterns
X
X   In addition to keeping a set of training patterns you can take
Xtesting patterns from a file as well.  To specify the file you can
Xinvoke the program with a second file name, as in:
X
X.ce
Xbp xor xtest
X
XIn addition, if you do the following:
X
X.ce
Xt f xtest
X
Xthe program will set xtest as the test file and immediately do the
Xtesting.  Once the file has been defined you can test the patterns
Xon the test file by "t f" or just "t".  (This leaves the t command
Xdoing double duty since "t <real>" will set the tolerance to <real>.)
XAlso in addition, the test file can be set without being tested by using
X
X.ce
XB t f xtest
X
Xas explained in the benchmarking section.
X
X
X.ce
X6. Saving and Restoring Weights and Related Values
X
X   Sometimes the amount of time and effort needed to produce a set of
Xweights to solve a problem is so great that it is more convenient to
Xsave the weights rather than constantly recalculate them.  Weights can
Xbe saved as real values in an ASCII format (the default) or as binary,
Xto save space.  To save the weights enter the command, `S'.  The weights
Xare written on a file called "weights".  The following file comes from
Xthe xor problem:
X
X.na
X.nf
X62r   file = ../xor3
X    9.5351562500
X   -8.4033203125
X    4.1308593750
X    5.5800781250
X   -4.9755859375
X  -11.3095703125
X    8.0527343750
X.ad
X.fi
X
XTo write the weights, the program starts with the second layer, writes
Xout the weights leading into these units in order with the threshold
Xweight last.  Then it moves on to the third layer, and so on.  To
Xrestore these weights, type an `R' for restore.  At this time, the
Xprogram reads the header line and sets the total number of iterations
Xthe program has gone through to be the first number it finds on the
Xheader line.  It then reads the character immediately after the number.
XThe `r' indicates that the weights will be real numbers represented as
Xcharacter strings.  If the weights were binary, the character would be
Xa `b' rather than an `r'.  Also, if the character is `b', the next
Xcharacter is read.  This next character indicates how many bytes are
Xused per value.  The integer versions, bp and sbp write files with 2
Xbytes per weight, while the real versions, rbp and srbp write files with
X8 bytes per weight for double precision reals and 4 bytes per weight
Xfor single precision reals.  With this notation, weight files written by
Xone program can be read by the other.  A binary weight format is
Xspecified within the `f' command by using "f wb".  A real format is
Xspecified by using "f wr".  If your program specifies that weights
Xshould be written in one format, but the weight file you read from is
Xdifferent, a notice will be given.  There is no check made to
Xsee if the number of weights on the file equals the number of weights in
Xthe network.
X
X   The above formats specify that only weights are written out and this
Xis all you need once the patterns have converged.  However, if you're
Xstill training the network and want to break off training and pick up
Xthe training from exactly the same point later on, you need to save
Xthe old weight changes when using momentum, and the parameters for the
Xdelta-bar-delta method if you are using this technique.  To save these
Xextra parameters on the weights file, use "f wR" to write the extra
Xvalues as real and "f wB" to write the extra values as binary.
X
X   In the above example, the command S, was used to save the weights
Ximmediately.  Another alternative is to save weights at regular
Xintervals.  The command, S 100, will automatically save weights every
X100 iterations the program does.  The default rate at which to save
Xweights is set at 100,000, which generally means that no weights will
Xever be saved.
X
X.ce
X7. Initializing Weights and Giving the Network a `Kick'
X
X   All the weights in the network initially start out at 0.  In
Xsymmetric networks then, no learning may result because error signals
Xcancel themselves out.  Even in non-symmetric networks, the training
Xprocess will usually converge faster if the weights start out at small
Xrandom values.  To do this, the `k' command will take the network and
Xalter the weights in the following ways.  Suppose the command given is:
X
X.ce
Xk 0 0.5
X
XNow, if a weight is exactly 0, then the weight will be changed to a
Xrandom value between +0.5 and -0.5.  The above command can therefore be
Xused to initialize the weights in the network.  A more complex use of
Xthe `k' command is to decrease the magnitude of large weights in the
Xnetwork by a certain random amount.  For instance, in the following
Xcommand:
X
X.ce
Xk 2 8
X
Xall the weights in the network that are greater than or equal to 2, will
Xbe decreased by a random number between 0 and 8.  Weights less than or
Xequal to -2 will be increased by a random number between 0 and 8.  The
Xseed to the random number generator can be changed using the `s' command
Xas in "s 7".  The integer parameter in the `s' command is of type,
Xunsigned.
X
X   Another method of giving a network a kick is to add hidden layer
Xunits.  The command:
X
X.ce
XH 2 0.5
X
Xadds one unit to layer 2 of the network and all the weights that are
Xcreated are initialized to between - 0.5 and + 0.5.
X
X   The subject of kicking a back-propagation network out of local minima
Xhas barely been studied and there is no guarantee that the above methods
Xare very useful in general.
X
X.ce
X8. Setting the Algorithm to Use
X
X   A number of different variations on the original back-propagation
Xalgorithm have been proposed in order to speed up convergence and some
Xof these have been built into these simulators.  These options are set
Xusing the `A' command and a number of options can go on the one line.
X
X.ce
XActivation Functions
X
X   To set the activation function, use:
X
X A al * the linear activation function
X A ap * the piece-wise activation function
X A as * the smooth activation function
X A at * the piecewise near-tanh function that runs from -1 to +1
X A aT * the continuous near-tanh function that runs from -1 to +1
X
XWhen using the linear activation function, it is only appropriate to
Xuse the differential step-size derivative and a two-layer network.
XThe smooth activation function is:
X
X.ce
Xf = 1.0 / (1.0 + exp(-x))
X
Xwhere x is the input to a unit.  The piece-wise function is an
Xapproximation to the function, f and it will normally save some CPU
Xtime even though it may increase the number of iterations you need to
Xsolve the problem.  The continuous near-tanh function is 2f - 1 and the
Xpiece-wise version approximates this function with a series of straight
Xlines.
X
X.ce
XSharpness (or Gain)
X
X   The sharpness (or gain) is the parameter, D, in the function:
X
X.ce
X1.0 / (1.0 + exp(-D*x)).
X
XA sharper sigmoid shaped activation function (larger D) will produce
Xfaster convergence (see "Speeding Up Back Propagation" by Yoshio Izui
Xand Alex Pentland in the Proceedings of \fIIJCNN-90-WASH-DC\fR, Lawrence
XErlbaum Associates, 1990).  To set this parameter, to say, 8,
Xuse "A D 8".  The default value is 1.  Unfortunately, too large a value
Xfor D will hurt convergence so this is not a perfect solution to
Xspeeding up learning.  Sometimes the best value for D may be less than
X1.0.  A larger D is also useful in the integer version of
Xback-propagation where the weights are limited to between -32 and
X+31.999.  A larger D value in effect magnifies the weights and makes it
Xpossible for the weights to stay smaller.  Values of D less than one may
Xbe useful in extracting a network from a local minima (see "Handwritten
XNumeral Recognition by Multi-layered Neural Network with Improved
XLearning Algorithm" by Yamada, Kami, Temma and Tsukumo in Proceedings of
Xthe 1989 IJCNN, IEEE Press).  A smaller value of D will also force the
Xweights and the weight changes to be larger and this may be of value
Xwhen the weight changes become less than the weight resolution of 0.001
Xin the integer version.
X
X.ce
XThe Derivatives
X
X   The correct derivative for the standard activation function is s(1-s)
Xwhere s is the activation value of a unit, however when s is near 0 or 1
Xthis term will give only very small weight changes during the learning
Xprocess.  To counter this problem, Fahlman proposed the following one
Xfor the output layer:
X
X.ce
X0.1 + s(1-s)
X
X(For the original description of this method, see "Faster Learning
XVariations of Back-Propagation:  An Empirical Study", by Scott E.
XFahlman, in \fIProceedings of the 1988 Connectionist Models Summer
XSchool\fR, Morgan Kaufmann, 1989.)  Besides Fahlman's derivative and the
Xoriginal one, the differential step size method (see "Stepsize Variation
XMethods for Accelerating the Back-Propagation Algorithm", by Chen and
XMars, in \fIIJCNN-90-WASH-DC\fR, Lawrence Erlbaum, 1990) takes the
Xderivative to be 1 in the layer going into the output units and uses the
Xcorrect derivative for all other layers.  The learning rate for the
Xinner layers is normally set to some smaller value.  To set a value for
Xeta2, give two values in the `e' command as in:
X
X.ce
Xe 0.1 0.01
X
XTo set the derivative, use the `A' command as in:
X
X.ne4
X  A dd   * use the differential step size derivative (default)
X  A dF   * use Fahlman's derivative in only the output layer
X  A df   * use Fahlman's derivative in all layers
X  A do   * use the original, correct derivative
X
X.ce
XUpdate Methods
X
X   The choices are the periodic (batch) method, the continuous (online)
Xmethod and Jacob's delta-bar-delta method.  The following commands
Xset the update methods:
X
X  A uc   * for the continuous update method
X  A ud   * for the delta-bar-delta method
X  A up   * for the original periodic update method (default)
X
XThe delta-bar-delta method uses a number of special parameters and these
Xare set using the `d' command.  Delta-bar-delta can be used with any of
Xthe derivatives and the algorithm will find its own value of eta for
Xeach weight.
X
X.ce
XOther Algorithm Options
X
X   The `b' command controls whether or not to backpropagate error for
Xunits that have learned their response to within a given tolerance.  The
Xdefault is to always backpropagate error.  The advantage to not
Xbackpropagating error is that this can save computer time.  This
Xparameter can be set like so:
X
X   A b+   * always backpropagate error
X   A b-   * don't backpropagate error when close
X
X   The `s' sub-command will set the number of times to skip a pattern
Xwhen the pattern has been learned to within the desired tolerance.  To
Xskip 3 iterations, use "A s 3", to reset to not skip any patterns
Xuse "A s 0".
X
X   The `t' sub-command will take the given pattern (only one at a
Xtime) out of the training set so that you can then train the other
Xpatterns and test the network's response to this one pattern that was
Xremoved.  To test pattern 3, use "A t 3" and to reset to use all the
Xpatterns use "A t 0".
X
X.ce
X9. The Delta-Bar-Delta Method
X
X   The delta-bar-delta method attempts to find a learning rate
Xeta, for each individual weight.  The parameters are the initial
Xvalue for the etas, the amount by which to increase an eta that seems
Xto be too small, the rate at which to decrease an eta that is
Xtoo large, a maximum value for each eta and a parameter used in keeping
Xa running average of the slopes.  Here are examples of setting these
Xparameters:
X
X.na
X.nf
Xd d 0.5    * sets the decay rate to 0.5
Xd e 0.1    * sets the initial etas to 0.1
Xd k 0.25   * sets the amount to increase etas by (kappa) to 0.25
Xd m 10     * sets the maximum eta to 10
Xd n 0.005  * an experimental noise parameter
Xd t 0.7    * sets the history parameter, theta, to 0.7
X.ad
X.fi
X
XThese settings can all be placed on one line:
X
X.ce
Xd d 0.5  e 0.1  k 0.25  m 10  t 0.7
X
XThe version implemented here does not use momentum.  The symmetric
Xversions, sbp and srbp do not implement delta-bar-delta.
X
X   The idea behind the delta-bar-delta method is to let the program find
Xits own learning rate for each weight.  The `e' sub-command sets the
Xinitial value for each of these learning rates.  When the program sees
Xthat the slope of the error surface averages out to be in the same
Xdirection for several iterations for a particular weight, the program
Xincreases the eta value by an amount, kappa, given by the `k' parameter.
XThe network will then move down this slope faster.  When the program
Xfinds the slope changes signs, the assumption is that the program has
Xstepped over to the other side of the minimum and so it cuts down the
Xlearning rate, by the decay factor, given by the `d' parameter.  For
Xinstance, a d value of 0.5 cuts the learning rate for the weight in
Xhalf.  The `m' parameter specifies the maximum allowable value for an
Xeta.  The `t' parameter (theta) is used to compute a running average of
Xthe slope of the weight and must be in the range 0 <= t < 1.  The
Xrunning average at iteration i, a\di\u, is defined as:
X
X.ce
Xa\di\u = (1 - t) slope\di\u + ta\di-1\u,
X
Xso small values for t make the most recent slope more important than
Xthe previous average of the slope.  Determining the learning rate for
Xback-propagation automatically is, of course, very desirable and this
Xmethod often speeds up convergence by quite a lot.  Unfortunately, bad
Xchoices for the delta-bar-delta parameters give bad results and a lot of
Xexperimentation may be necessary.  If you have n patterns in the
Xtraining set try starting e and k around 1/n.  The n parameter is an
Xexperimental noise term that is only used in the integer version.  It
Xchanges a weight in the wrong direction by the amount indicated when the
Xprevious weight change was 0 and the new weight change would be 0 and
Xthe slope is non-zero.  (I found this to be effective in an integer
Xversion of quickprop so I tossed it into delta-bar-delta as well.  If
Xyou find this helps, please let me know.)  For more on
Xdelta-bar-delta see "Increased Rates of Convergence" by Robert A.
XJacobs, in \fINeural Networks\fR, Volume 1, Number 4, 1988.
X
X.ce
X10. Recurrent Networks
X
X   Recurrent back-propagation networks take values from higher level
Xunits and use them as activation values for lower level units.  This
Xgives a network a simple kind of short-term memory, possibly a little
Xlike human short-term memory.  For instance, suppose you want a network
Xto memorize the two short sequences, "acb" and "bcd".  In the middle of
Xboth of these sequences is the letter, "c".  In the first case you
Xwant a network to take in "a" and output "c", then take in "c" and
Xoutput "b".  In the second case you want a network to take in "b" and
Xoutput "c", then take in "c" and output "d".  To do this, a network
Xneeds a simple memory of what came before the "c".
X
X   Let the network be an 7-3-4 network where input units 1-4 and output
Xunits 1-4 stand for the letters a-d.  Furthermore, let there be 3 hidden
Xlayer units.  The hidden units will feed their values back down to the
Xinput units 5-7, where they become input for the next step.  To see why
Xthis works, suppose the patterns have been learned by the network.
XInputing the "a" from the first string produces some random pattern of
Xactivation, p1, on the hidden layer units and "c" on the output units.
XThe pattern p1 is copied down to units 5-7 of the input layer.  Second,
Xthe letter, "c" is presented to the network together with p1 now on
Xunits 5-7.  This will give "b" on the output units.  However, if the "b"
Xfrom the second string is presented first, there will be a different
Xrandom pattern, p2, on the hidden layer units.  These values are copied
Xdown to input units 5-7.  These values combine with the "c" to produce
Xthe output, "d".
X
X   The training patterns for the network can be:
X
X     1000 000   0010  * "a" prompts the output, "c"
X     0010 hhh   0100  * inputing "c" should produce "b"
X
X     0100 000   0010  * "b" prompts the output, "c"
X     0010 hhh   0001  * inputing "c" should produce "d"
X
Xwhere the first four values on each line are the normal input, the
Xmiddle three either start out all zeros or take their values from the
Xprevious values of the hidden units.  The code for taking these values
Xfrom the hidden layer units is "h".  The last set of values represents
Xthe output that should be produced.  To take values from the third layer
Xof a network, the code is "i".  For the fourth and fifth layers (if they
Xexist) the codes are "j" and "k".  Training recurrent networks can take
Xmuch longer than training standard networks and the average error can
Xjump up and down quite a lot.
X
X.ce
X11. The Benchmarking Command
X
X   The main purpose of the benchmarking command is to make it possible
Xto run a number of tests of a problem with different initial weights and
Xaverage the number of iterations and CPU time for networks that
Xconverged.  A second purpose is to run a training set thru the network a
Xnumber of times, and for each try, a test pattern or a test set can be
Xchecked at regular intervals.
X
X   A typical command to simply test the current parameters on a number
Xof networks is:
X
X.ce
XB g 5 m 15 k 1 r 1000 200
X
XThe "g 5" specifies that you'd like to set the goal of getting 5
Xnetworks to converge but the "m 15" sets a maximum of 15 tries to
Xreach this goal.  The k specifies that each initial network will get
Xa kick by setting each weight to a random number between -1 and 1.
XThe "r 1000 200" portion specifies that you should run up to 1000
Xiterations on a network and print the status of learning every 200
Xiterations.  This follows the normal run command and the second
Xparameter defining how often to print the statistics can be omitted.
XFor example, here is some output from benchmarking with the xor
Xproblem:
X
X.na
X.nf
X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? B g 5 m 5 k 1 r 200
X seed =      7; running . . .
Xpatterns learned to within 0.10 at iteration 62
X seed =      7; running . . .
X seed =      7; running . . .
Xpatterns learned to within 0.10 at iteration 54
X seed =      7; running . . .
Xpatterns learned to within 0.10 at iteration 39
X seed =      7; running . . .
Xpatterns learned to within 0.10 at iteration 44
X1 failures; 4 successes; average = 49.750000     0.333320 sec/network
X.ad
X.fi
X
XThe time/network includes however much time is used to print messages
Xso to time the process effectively, all printing should be minimized.
XThe timing is done using the UNIX clock(3C) function and on a UNIX PC
Xat least, the time returned by clock will overflow after 2147 seconds, or
Xabout 36 minutes.  If you system has the same limitation, take care that
XALL of the benchmarking you do in a single call of the program adds up
Xto less than 36 minutes.
X
X   In the above example, the seed that was used to set the random values
Xfor the weights was set to 7 (outside the benchmarking command), however
Xif you set a number of seeds, as in:
X
X.ce
Xs 3 5 7 18484 99
X
Xthe seeds will be taken in order for each network.  When there are more
Xnetworks to try than there are seeds, the random values keep coming from
Xthe last seed value, so actually you can get by using a single seed.
XThe idea behind allowing multiple seeds is so that if one network does
Xsomething interesting you can use that seed to run a network with the
Xsame initial weights outside of the benchmarking command.
X
X   Once the benchmarking parameters have been set, it is only necessary
Xto include the run portion in order to start the benchmarking process
Xagain, thus, "B r 200" will run benchmarking again using the current set
Xof parameters.  Also, the parameters can be set without starting the
Xbenchmarking process by just not including the `r' parameters in the B
Xcommand as in:
X
X.ce
XB g 5 m 5 k 1
X
X   In addition to getting data on convergence, you can have the program
Xrun test patterns thru the network at the print statistics rate given
Xin the `r' sub-command.  To specify the test file, test100, use:
X
X.ce
XB t f test100
X
XTo run the training data thru for up to 1000 iterations and test every
X200 iterations use:
X
X.ce
XB r 1000 200
X
XIf the pattern format specification p is set to either c or g you will
Xget a summary of the patterns on the test file.  If p is either C or G
Xyou will get the results for each pattern listed as well as the summary.
XTo stop testing the data on the data file, use "B t 0".
X
X   Sometimes you may have so little data available that it is difficult
Xto separate the patterns into a training set and a test set.  One
Xsolution is to remove each pattern from the training set, train the
Xnetwork on the remaining patterns and then test the network on the
Xpattern that was removed.  To remove a pattern, say pattern 1 from the
Xtraining set use:
X
X.ce
XB t 1
X
XTo systematically remove each pattern from the training set, use a
Xdata file with the following commands:
X
X.na
X.nf
XB t 1 r 200 50
XB t 2 r 200 50
XB t 3 r 200 50
X ... etc.
X.ad
X.fi
X
Xand the pattern will be tested every 50 iterations.  If, in the course
Xof training the network, all the patterns converge, the program will
Xprint out a line starting with a capital S followed by a test of the
Xtest pattern.  If the programs hits the point where statistics on the
Xlearning process have to be printed and the network has not converged,
Xthen a capital F will print out followed by a test of the test pattern.
XTo stop this testing, use "B t 0".
X
X   It would be nice to have the program average up and tabulate all the
Xdata that comes out of the benchmarking command, but I thought I'd leave
Xthat to users for now.  You can use the record command to save the
Xoutput from the entire session and then run it thru some other program,
Xsuch as an awk program in order to sort everything out.
X
X.ce
X12. Miscellaneous Commands
X
X   Below is a list of some miscellaneous commands, a short example of
Xeach and a short description of the command.
X
X.IP "   !   !cmd    " 15
XAnything after `!' will be passed on to UNIX as a command to execute.
X
X.IP "   C   C       " 15
XThe C command will clear the network of values, reset the number of
Xiterations to 0 and reset other values so that another run can be made.
XThe seed value is reset so you can run the same network over with the
Xsame initial weights but with different learning parameters.  Even
Xthough the seed is reset, the weights are not initialized, so you
Xmust do this step after clearing the network.
X
X.IP "   i   i f     " 15
XEntering "i f" will read commands from the file, f.  When there are
Xno more commands on the file, the program resumes reading from the
Xprevious file being used.  You can also have an `i' command within the
Xfile f, however the depth to which you can nest the number of active
Xfiles is 4 and stdin itself counts as the first one.  Once an input
Xfile has been specified, you can simply type "i" to read from the file
Xagain.
X
X.IP "   l   l 2     " 15
XEntering "l 2" will print the values of the units on layer 2,
Xor whatever layer is specified.
X
X.IP "   T   T -3    " 15
XIn sbp and srbp only, "T -3" sets all the threshold weights
Xto -3 or whatever value is specified and freezes them at this value.
X
X.IP "   W   W 0.9   " 15
XEntering "W 0.9" will remove (whittle away) all the weights with
Xabsolute values less than 0.9.
X.in-15
X
XIn addition, when a user generated interrupt occurs (by typing DEL)
Xthe program will drop its current task and take the next command from
Xthe keyboard.
X
X.ce
X13. Limitations
X
X   Weights in the bp and sbp programs are 16-bit integer weights, where
Xthe real value of the weight has been multiplied by 1024.  The integer
Xversions cannot handle weights less than -32 or greater than 31.999.
XThe weight changes are all checked for overflow but there are other
Xplaces in these programs where calculations can possibly overflow as
Xwell and none of these places are checked.  Input values for the integer
Xversions can run from -31.994 to 31.999.  Due to the method used to
Ximplement recurrent connections, input values in the real version are
Xlimited to -31994.0 and above.
END_OF_FILE
if test 43266 -ne `wc -c <'README'`; then
    echo shar: \"'README'\" unpacked with wrong size!
fi
# end of 'README'
fi
echo shar: End of archive 1 \(of 4\).
cp /dev/null ark1isdone
MISSING=""
for I in 1 2 3 4 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 4 archives.
    rm -f ark[1-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0

exit 0 # Just in case...