Newsgroups: comp.sources.misc From: Donald Tveter Subject: v22i073: back-prop - Fast Backpropagation, Part01/04 Message-ID: X-Md4-Signature: b2d7e2bed3d9aa8488e8622e25962850 Date: Wed, 28 Aug 1991 03:20:00 GMT Approved: kent@sparky.imd.sterling.com Submitted-by: Donald Tveter Posting-number: Volume 22, Issue 73 Archive-name: back-prop/part01 Environment: UNIX Supersedes: back-prop: Volume 14, Issue 84-87 The programs described below were produced for my own use in studying back-propagation and for doing some examples that are found in my introduction to Artificial Intelligence textbook, The Basis of Artificial Intelligence, to be published by Computer Science Press. I have copyrighted these files but I hereby give permission to anyone to use them for experimentation, educational purposes or to redistribute them on a not for profit basis. All others that want to use these programs for business or commercial purposes, please contact me at the address in the README for permission. There are four simulators that can be constructed from the in- cluded files. The program, rbp, does back-propagation using real weights and arithmetic. The program, bp, does back-propagation using 16-bit integer weights, 16 and 32-bit integer arithmetic and some floating point arithmetic. The program, sbp, uses 16- bit integer symmetric weights but only allows two-layer networks. The program srbp does the same using real weights. The purpose of sbp and srbp is to produce networks that can be used with the Boltzman machine relaxation algorithm (not included). See the file README and Changes file for more information. Dr. Donald R. Tveter 5228 N. Nashville Ave. Chicago, Illinois 60656 USENET: drt@chinet.chi.il.us ----- #! /bin/sh # This is a shell archive. Remove anything before this line, then unpack # it by saving it into a file and typing "sh file". To overwrite existing # files, type "sh file -c". You can also feed this as standard input via # unshar, or by typing "sh 'README' <<'END_OF_FILE' X.ce XFast Back-Propagation X.ce XCopyright (c) 1991 by Donald R. Tveter X X X.ce X1. Introduction X X The programs described below were produced for my own use in studying Xback-propagation and for doing some examples that are found in my Xintroduction to Artificial Intelligence textbook, \fIThe Basis of XArtificial Intelligence\fR, to be published by Computer Science Press. X(One would hope some time before the sun burns out or before the Cubs Xwin the World Series or before Congress balances the budget or ... .) XI have copyrighted these files but I hereby give permission to Xanyone to use them for experimentation, educational purposes or to Xredistribute them on a not for profit basis. All others that want to Xuse these programs for business or commercial purposes, I will charge Xyou a small fee. You should contact me by mail at: X X.na X.nf XDr. Donald R. Tveter X5228 N. Nashville Ave. XChicago, Illinois 60656 XUSENET: drt@chinet.chi.il.us X.ad X.fi X XAlso, I would be interested in hearing from anyone with suggestions, Xbug reports and major successes or failures. X X Note: this is use at your own risk software: there is no guarantee Xthat it is bug-free. Use of this software constitutes acceptance for Xuse in an as is condition. There are no warranties with regard to this Xsoftware. In no event shall the author be liable for any damages Xwhatsoever arising out of or in connection with the use or performance Xof this software. X X There is also a good chance that I will soon produce and sell DOS and Xextended DOS binary versions with more capabilities than there are in Xthis UNIX (tm of AT&T Bell Labs) source code version. They will Xprobably include Quickprop, SuperSAB and Cascade Correlation at the very Xleast and perhaps also DSM, LVQ1 and counter-propagation as well. X X There are four simulators that can be constructed from the included Xfiles. The program, rbp, does back-propagation using real weights and Xarithmetic. The program, bp, does back-propagation using 16-bit integer Xweights, 16 and 32-bit integer arithmetic and some floating point Xarithmetic. The program, sbp, uses 16-bit integer symmetric weights but Xonly allows two-layer networks. The program srbp does the same using Xreal weights. The purpose of sbp and srbp is to produce networks that Xcan be used with the Boltzman machine relaxation algorithm (not Xincluded). X X In most cases, the 16-bit integer programs are the most useful, Xbecause they are the fastest. Unfortunately, sometimes 16-bit Xinteger weights don't have enough range or precision and then using the Xfloating point versions may be necessary. Many other speed-up Xtechniques are included in these programs. X X.ce XUpdated Version X X This version contains a number of improvements and changes to command Xnames vs. the September 1990 version. For a summary of changes and Xadditions see the file changes. X X.ce X2. Making the Simulators X X This code has been written to use either 32-bit floating point X(float) or 64-bit floating point (double) arithmetic. On System V Xmachines the standard seems to be that all floating point arithmetic is Xdone with double precision arithmetic so double arithmetic is faster Xthan float so this is the default. Other versions of C (e.g. ANSI C) Xwill do single precision real arithmetic. To get 32-bit floating point Xset the compiler flag FLOAT in the makefile. The function, exp, defined Xin real.c is double since System V specifies it as double. If your C Xuses float, change this definition as well. X X One option exists for bp and sbp. If your compiler isn't smart Xenough to divide by 1024 by shifting, remove the SMART flag in the Xmakefile. X X To make a particular executable file, use the makefile given Xwith the data files and make any or all of them like so: X X.ce Xmake bp X.ce X make sbp X.ce X make rbp X.ce X make srbp X X To make a record of all the input and output from the programs, Xthe following small UNIX command file I call "record" can be used: X X.na X.nf Xtrap "" 2 Xoutfile="${1}.record" Xif test -f $outfile X then X rm $outfile X fi Xecho $outfile X(tee -a $outfile | $*) | tee -a $outfile Xprocess=`ps | grep tee | cut -c1-6` Xkill $process X.ad X.fi X XFor example to make a record of all the input and output from the Xprogram bp using data file, xor, use: X X.ce Xrecord bp xor X X.ce X3. A Simple Example X X Each version would normally be called with the name of a file to read Xcommands from, as in: X X.ce Xbp xor X XWhen no file name is specified, bp will take commands from the keyboard X(UNIX stdin file). After the data file is read commands are then taken Xfrom the keyboard. X X The commands are one letter commands and most of them have optional Xparameters. The `A', `B', `d' and `f' commands allow a number of Xsub-commands on a line. The maximum length of any line is 256 Xcharacters. An `*' is a comment and it can be used to make the Xremainder of the line a comment. Here is an example of a data file to Xdo the xor problem: X X.na X.nf X* input file for the xor problem X Xm 2 1 1 * make a 2-1-1 network Xc 1 1 3 1 * add this extra connection Xc 1 2 3 1 * add this extra connection Xs 7 * seed the random number function Xk 0 1 * give the network random weights X Xn 4 * read four new patterns into memory X1 0 1 X0 0 0 X0 1 1 X1 1 0 X Xe 0.5 * set eta to 0.5 (and eta2 to 0.5) Xa 0.9 * set alpha to 0.9 X.ad X.fi X XFirst, in this example, the m command will make a network with 2 units Xin the input layer, 1 unit in the second layer and 1 unit in the third Xlayer. The following c commands create extra connections from layer 1 Xunit 1 to layer 3 unit 1 and from layer 1 unit 2 to layer 3 unit 1. XThe `s' command sets the seed for the random number function. The `k' Xcommand then gives the network random weights. The `k' command has Xanother use as well. It can be used to try to kick a network out of a Xlocal minimum. Here, the meaning of "k 0 1" is to examine all the Xweights in the network and for every weight equal to 0 (and they all Xstart out at 0), add in a random number between -1 and +1. The `n' Xcommand specifies four new patterns to be read into memory. With Xthe `n' command, any old patterns that may have been present are Xremoved. There is also an `x' command that behaves like the `n' Xcommand, except the `x' commands \fIadds\fR the extra patterns to the Xcurrent training set. The input pattern comes first, followed by the Xoutput pattern. The statement, e 0.5, sets eta, the learning rate for Xthe upper layer to 0.5 and eta2 for the lower layers to 0.5 as well. XThe last line sets alpha, the momentum parameter, to 0.9. X X After these commands are executed, the following messages and prompt Xappears: X X.na X.nf X.ne3 XFast Back-Propagation Copyright (c) 1991 by Donald R. Tveter Xtaking commands from stdin now X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? X.ad X.fi X XThe characters within the square brackets are a list of the possible Xcommands. To run 100 iterations of back-propagation and print out the Xstatus of the learning every 20 iterations type "r 100 20" at the Xprompt: X X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? r 100 20 X XThis gives: X X.na X.nf Xrunning . . . X 20 iterations 0.00% right (0 right 4 wrong) 0.49927 error/unit X 40 iterations 0.00% right (0 right 4 wrong) 0.43188 error/unit X 60 iterations 75.00% right (3 right 1 wrong) 0.09033 error/unit X 62 iterations 100.00% right (4 right 0 wrong) 0.07129 error/unit Xpatterns learned to within 0.10 at iteration^G 62 X.ad X.fi X XThe program immediately prints out the "running . . ." message. After Xeach 20 iterations, a summary of the learning process is printed, giving Xthe percentage of patterns that are right, the number right and wrong Xand the average value of the absolute values of the errors of the output Xunits. The program stops when the each output for each pattern has been Xlearned to within the required tolerance, in this case the default value Xof 0.1. A ctrl-G is normally printed out as well to sound the bell. If Xthe second number defining how often to print out a summary is omitted, Xthe summaries will not be printed. Sometimes the integer versions will Xdo a few extra iterations before declaring the problem done because of Xtruncation errors in the arithmetic done to check for convergence. The Xstatus figures for iteration i are computed when making the forward pass Xof the iteration and before the weights are updated so these values are Xone iteration out of date. This saves on CPU time, however, if you Xreally need up-do-date statistics use the u+ option described in the Xformat specifications. X X.ce XListing Patterns X X To get a listing of the status of each pattern, use the `P' command Xto give: X X.na X.nf X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? P X 1 0.90 (0.098) ok X 2 0.05 (0.052) ok X 3 0.94 (0.062) ok X 4 0.07 (0.074) ok X 62 iterations 100.00% right (4 right 0 wrong) 0.07129 error/unit X.ad X.fi X XThe numbers in parentheses give the sum of the absolute values of the Xoutput errors for each pattern. An `ok' is given to every pattern that Xhas been learned to within the required tolerance. To get the status of Xone pattern, say, the fourth pattern, type "P 4" to give: X X 0.07 (0.074) ok X XTo get a summary without the complete listing, use "P 0". To get the Xoutput targets for a given pattern, say pattern 3, use "O 3". X X A particular test pattern can be input to the network with the `p' Xcommand, as in: X X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? p 1 0 X 0.90 X X.ce XExamining Weights X X It is often interesting to see the values of some particular weights Xin the network. To see a listing of all the weights in a network, use Xthe save weights command described below and then cat the file weights, Xhowever, to see the weights leading into a particular node, say Xthe node in row 2, node 1 use the w command as in: X X.na X.nf X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? w 2 1 Xlayer unit unit value weight input from unit X 1 1 1.00000 9.53516 9.53516 X 1 2 0.00000 -8.40332 0.00000 X 2 t 1.00000 4.13086 4.13086 X sum = 13.66602 X.ad X.fi X XThis listing also gives data on how the current activation value of Xthe node is computed using the weights and the activations values of Xthe nodes feeding into unit 1 of layer 2. The `t' unit is the Xthreshold unit. X X.ce XThe Help Command X X To get a short description of any command, type `h' followed by Xthe letter of the command. Here, we type h h for help with help: X X.na X.nf X.ne3 X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? h h X Xh gives help for command . X.ad X.fi X XTo list the status of all the parameters in the program, use `?'. X X.ce XTo Quit the Program X X Finally, to end the program, the `q' (for quit) command is entered: X X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? q X X.ce X4. The Format Command X X There are several ways to input and output patterns, numbers and Xother values and there is one format command, `f', that is used to set Xthese options. In the format command, a number of options can be Xspecified on a single line, as for example in: X X.ce Xf b+ ir oc s- wB X X.ce XInput Patterns X X The programs are able to read pattern values in two different Xformats. The default input format is the compressed format. In it, Xeach value is one character and it is not necessary to have blanks Xbetween the characters. For example, in compressed format, the patterns Xfor xor could be written out in either of the following ways: X X.ce X101 10 1 X.ce X000 00 0 X.ce X011 01 1 X.ce X110 11 0 X XThe second example is preferable because it makes it easier to see the Xinput and the output patterns. Compressed format can also be used to Xinput patterns with the `p' command. In addition to using 1 and 0 as Xinput, the character, `?' can be used. This character is initially Xdefined to be 0.5, but it can be redefined using the Q command like so: X X.ce XQ -1 X XThis sets the value of ? to -1. Other valid input characters are the Xletters, `h', `i', `j' and `k'. The `h' stands for `hidden'. Its Xmeaning in an input string is that the value at this point in the string Xshould be taken from the next unit in the second layer of the network. XThis notation is useful for specifying simple recurrent Xnetworks. Naturally, `i', `j' and `k' stand for taking input Xvalues from the third, fourth and fifth layers (if they exist). A Xsimple example of a recurrent network is given later. X X The other input format for numbers is real. The number portion must Xstart with a digit (.35 is not allowed, but 0.35 is). Exponential Xnotation is not allowed. Real numbers have to be separated by a space. XThe `h', `i', `j', `k' and `?' characters are also allowed with real Xinput patterns. To take input in the real format, it is necessary Xto set the input format to be real using the `f' (format) command as in: X X.ce Xf ir X XTo change back to the compressed format, use: X X.ce Xf ic X X.ce XOutput of Patterns X X Output format is controlled with the `f' command as in: X X.ce Xf or X.ce Xf oc X.ce Xf oa X XThe first sets the output to real numbers. The second sets the Xoutput to be compressed mode where the value printed will be a `1' when Xthe unit value is greater than 1.0 - tolerance, a `^' when the value Xis above 0.5 but less than 1.0 - tolerance, a `v' when the value is Xless than 0.5 but greater than the tolerance. Below the tolerance Xvalue, a `0' is printed. The tolerance can be changed using the `t' Xcommand (not a part of the format command). For example, to make all Xvalues greater than 0.8 print as `1' and all values less than 0.2 print Xas `0', use: X X.ce Xt 0.2 X XOf course, this same tolerance value is also used to check to see if all Xthe patterns have converged. The third output format is meant to Xgive "analog compressed" output. In this format, a `c' is printed when Xa value is close enough to its target value. Otherwise, if the answer Xis close to 1, a `1' is printed, if the answer is close to 0, a `0' is Xprinted, if the answer is above the target but not close to 1, a `^' is Xprinted and if the answer is below the target but not close to 0, a `v' Xis printed. This output format is designed for problems where the Xoutput is a real number, as for instance, when the problem is to make a Xnetwork learn sin(x). X X For the sake of convenience, the output format (and only the Xoutput format) can be set without using the `f', so that: X X.ce Xor X Xwill also make the output format real. X X.ce XBreaking up the Output Values X X In the compressed formats, the default is to print a blank after Xevery 10 values. This can be altered using the `b' (for inserting Xbreaks) command. The use for this command is to separate output values Xinto logical groups to make the output more readable. For instance, you Xmay have 24 output units where it makes sense to insert blanks after the X4th, 7th and 19th positions. To do this, specify: X X.ce Xb 4 7 19 X XThen for example, the output will look like: X X.na X.nf X 1 10^0 10^ ^000v00000v0 01000 (0.17577) X 2 1010 01v 0^0000v00000 ^1000 (0.16341) X 3 0101 10^ 00^00v00000v 00001 (0.16887) X 4 0100 0^0 000^00000v00 00^00 (0.19880) X.ad X.fi X XThe `b' command allows up to 20 break positions to be specified. The Xdefault output format is the real format with 10 numbers per line. For Xthe output of real values, the `b' command specifies when to print a Xcarriage return rather than when to print a blank. (Note: the `b' Xcommand is not part of the `f' command.) X X.ce XPattern Formats X X There are two different types of problems that back-propagation can Xhandle, the general type of problem where every output unit can take on Xan arbitrary value and the classification type of problem where the goal Xis to turn on output unit i and turn off all the other output units when Xthe pattern is of class i. The xor problem is an example of the general Xtype of problem. For an example of a classification problem, suppose Xyou have a number of data points scattered about through two-dimensional Xspace and you have to classify the points as either class 1, class 2 or Xclass 3. For a pattern of class 1 you can always set up the Xoutput: "1 0 0", for class 2: "0 1 0" and for class 3: "0 0 1", Xhowever doing the translation to bit patterns can be annoying, so Xanother notation can be used. Instead of specifying the bit patterns Xyou can set the pattern format option to classification (as opposed to Xthe default value of general) like so: X X.ce Xf pc X Xand then the program will read data in the form: X X 1.33 3.61 1 * shorthand for 1 0 0 X 0.42 -2.30 2 * shorthand for 0 1 0 X -0.31 4.30 3 * shorthand for 0 0 1 X Xand translate it to the bit string form when the pattern is loaded Xonto the output units. To switch to the general form, use "f pg". X X In addition to controlling the input of data, the p command within Xthe format command is used to control the output of patterns from a set Xof test patterns kept on a file. If the format is either c or g then Xwhen the test set is run thru the network you will only get a summary Xof how many patterns are correct. If the format is either C or G you Xwill get a listing of the output values for each pattern as well as Xthe summary. When reading patterns, C works the same as c and G works Xthe same as g. X X.ce XControlling Summaries X X When the program is learning patterns you can have it print out Xthe status of the learning process at regular intervals. The default Xis to print out only a summary of how learning is going, however you Xcan also print out the status of every pattern at regular intervals. XTo get the whole set of patterns, use "f s-" to turn off the summary Xformat and "f s+" to go back to summarizing. X X.ce XRinging the Bell X X To ring the bell when the learning has been completed use "f b+" Xand to turn off the bell, use "f b-". X X.ce XEchoing Input X X When you are reading commands from a file, it is often worthwhile Xto see those commands echoed on the screen. To do this, use "f e+" Xand to turn off the echoing, use "f e-". X X.ce XUp-To-Date Statistics X X During the ith pass thru the network the program will collect Xstatistics on how many patterns are correct and how much error there Xis overall. These numbers are gathered before the weights are updated Xand so the results listed for iteration i really show the situation Xafter the weight update for iteration i-1. To complicate matters more Xthe weight updates can be done continuously instead of after all the Xpatterns have been presented so the statistics you get here are skewed Xeven more. If you want to have up-to-date statistics with either Xmethod, use "f u+" and to go back to statistics that are out of date, Xuse "f u-". The penalty with "f u+" is that the program needs to do Xanother forward pass. When using the continuous update method it is Xhighly advisable to use "f u+", at least when you get close to Xcomplete convergence because the default method of checking may claim Xthe learning is finished when it isn't or it may continue training Xafter the tolerances have been met. X X.ce X 5. Taking Training and Testing Patterns from Files X X In the xor example given above the four patterns were part of the Xdata file and to read them in the following lines were used: X X.na X.nf Xn 4 X1 0 1 X0 0 0 X0 1 1 X1 1 0 X.ad X.fi X XHowever, it is also convenient to take patterns from a file that Xcontains nothing but a list of patterns (and possibly comments). XTo read a new set of patterns from some file, patterns, use: X X.ce Xn f patterns X XTo add an extra bunch of patterns to the current set you can use: X X.ce Xx f patterns X X In addition to keeping a set of training patterns you can take Xtesting patterns from a file as well. To specify the file you can Xinvoke the program with a second file name, as in: X X.ce Xbp xor xtest X XIn addition, if you do the following: X X.ce Xt f xtest X Xthe program will set xtest as the test file and immediately do the Xtesting. Once the file has been defined you can test the patterns Xon the test file by "t f" or just "t". (This leaves the t command Xdoing double duty since "t " will set the tolerance to .) XAlso in addition, the test file can be set without being tested by using X X.ce XB t f xtest X Xas explained in the benchmarking section. X X X.ce X6. Saving and Restoring Weights and Related Values X X Sometimes the amount of time and effort needed to produce a set of Xweights to solve a problem is so great that it is more convenient to Xsave the weights rather than constantly recalculate them. Weights can Xbe saved as real values in an ASCII format (the default) or as binary, Xto save space. To save the weights enter the command, `S'. The weights Xare written on a file called "weights". The following file comes from Xthe xor problem: X X.na X.nf X62r file = ../xor3 X 9.5351562500 X -8.4033203125 X 4.1308593750 X 5.5800781250 X -4.9755859375 X -11.3095703125 X 8.0527343750 X.ad X.fi X XTo write the weights, the program starts with the second layer, writes Xout the weights leading into these units in order with the threshold Xweight last. Then it moves on to the third layer, and so on. To Xrestore these weights, type an `R' for restore. At this time, the Xprogram reads the header line and sets the total number of iterations Xthe program has gone through to be the first number it finds on the Xheader line. It then reads the character immediately after the number. XThe `r' indicates that the weights will be real numbers represented as Xcharacter strings. If the weights were binary, the character would be Xa `b' rather than an `r'. Also, if the character is `b', the next Xcharacter is read. This next character indicates how many bytes are Xused per value. The integer versions, bp and sbp write files with 2 Xbytes per weight, while the real versions, rbp and srbp write files with X8 bytes per weight for double precision reals and 4 bytes per weight Xfor single precision reals. With this notation, weight files written by Xone program can be read by the other. A binary weight format is Xspecified within the `f' command by using "f wb". A real format is Xspecified by using "f wr". If your program specifies that weights Xshould be written in one format, but the weight file you read from is Xdifferent, a notice will be given. There is no check made to Xsee if the number of weights on the file equals the number of weights in Xthe network. X X The above formats specify that only weights are written out and this Xis all you need once the patterns have converged. However, if you're Xstill training the network and want to break off training and pick up Xthe training from exactly the same point later on, you need to save Xthe old weight changes when using momentum, and the parameters for the Xdelta-bar-delta method if you are using this technique. To save these Xextra parameters on the weights file, use "f wR" to write the extra Xvalues as real and "f wB" to write the extra values as binary. X X In the above example, the command S, was used to save the weights Ximmediately. Another alternative is to save weights at regular Xintervals. The command, S 100, will automatically save weights every X100 iterations the program does. The default rate at which to save Xweights is set at 100,000, which generally means that no weights will Xever be saved. X X.ce X7. Initializing Weights and Giving the Network a `Kick' X X All the weights in the network initially start out at 0. In Xsymmetric networks then, no learning may result because error signals Xcancel themselves out. Even in non-symmetric networks, the training Xprocess will usually converge faster if the weights start out at small Xrandom values. To do this, the `k' command will take the network and Xalter the weights in the following ways. Suppose the command given is: X X.ce Xk 0 0.5 X XNow, if a weight is exactly 0, then the weight will be changed to a Xrandom value between +0.5 and -0.5. The above command can therefore be Xused to initialize the weights in the network. A more complex use of Xthe `k' command is to decrease the magnitude of large weights in the Xnetwork by a certain random amount. For instance, in the following Xcommand: X X.ce Xk 2 8 X Xall the weights in the network that are greater than or equal to 2, will Xbe decreased by a random number between 0 and 8. Weights less than or Xequal to -2 will be increased by a random number between 0 and 8. The Xseed to the random number generator can be changed using the `s' command Xas in "s 7". The integer parameter in the `s' command is of type, Xunsigned. X X Another method of giving a network a kick is to add hidden layer Xunits. The command: X X.ce XH 2 0.5 X Xadds one unit to layer 2 of the network and all the weights that are Xcreated are initialized to between - 0.5 and + 0.5. X X The subject of kicking a back-propagation network out of local minima Xhas barely been studied and there is no guarantee that the above methods Xare very useful in general. X X.ce X8. Setting the Algorithm to Use X X A number of different variations on the original back-propagation Xalgorithm have been proposed in order to speed up convergence and some Xof these have been built into these simulators. These options are set Xusing the `A' command and a number of options can go on the one line. X X.ce XActivation Functions X X To set the activation function, use: X X A al * the linear activation function X A ap * the piece-wise activation function X A as * the smooth activation function X A at * the piecewise near-tanh function that runs from -1 to +1 X A aT * the continuous near-tanh function that runs from -1 to +1 X XWhen using the linear activation function, it is only appropriate to Xuse the differential step-size derivative and a two-layer network. XThe smooth activation function is: X X.ce Xf = 1.0 / (1.0 + exp(-x)) X Xwhere x is the input to a unit. The piece-wise function is an Xapproximation to the function, f and it will normally save some CPU Xtime even though it may increase the number of iterations you need to Xsolve the problem. The continuous near-tanh function is 2f - 1 and the Xpiece-wise version approximates this function with a series of straight Xlines. X X.ce XSharpness (or Gain) X X The sharpness (or gain) is the parameter, D, in the function: X X.ce X1.0 / (1.0 + exp(-D*x)). X XA sharper sigmoid shaped activation function (larger D) will produce Xfaster convergence (see "Speeding Up Back Propagation" by Yoshio Izui Xand Alex Pentland in the Proceedings of \fIIJCNN-90-WASH-DC\fR, Lawrence XErlbaum Associates, 1990). To set this parameter, to say, 8, Xuse "A D 8". The default value is 1. Unfortunately, too large a value Xfor D will hurt convergence so this is not a perfect solution to Xspeeding up learning. Sometimes the best value for D may be less than X1.0. A larger D is also useful in the integer version of Xback-propagation where the weights are limited to between -32 and X+31.999. A larger D value in effect magnifies the weights and makes it Xpossible for the weights to stay smaller. Values of D less than one may Xbe useful in extracting a network from a local minima (see "Handwritten XNumeral Recognition by Multi-layered Neural Network with Improved XLearning Algorithm" by Yamada, Kami, Temma and Tsukumo in Proceedings of Xthe 1989 IJCNN, IEEE Press). A smaller value of D will also force the Xweights and the weight changes to be larger and this may be of value Xwhen the weight changes become less than the weight resolution of 0.001 Xin the integer version. X X.ce XThe Derivatives X X The correct derivative for the standard activation function is s(1-s) Xwhere s is the activation value of a unit, however when s is near 0 or 1 Xthis term will give only very small weight changes during the learning Xprocess. To counter this problem, Fahlman proposed the following one Xfor the output layer: X X.ce X0.1 + s(1-s) X X(For the original description of this method, see "Faster Learning XVariations of Back-Propagation: An Empirical Study", by Scott E. XFahlman, in \fIProceedings of the 1988 Connectionist Models Summer XSchool\fR, Morgan Kaufmann, 1989.) Besides Fahlman's derivative and the Xoriginal one, the differential step size method (see "Stepsize Variation XMethods for Accelerating the Back-Propagation Algorithm", by Chen and XMars, in \fIIJCNN-90-WASH-DC\fR, Lawrence Erlbaum, 1990) takes the Xderivative to be 1 in the layer going into the output units and uses the Xcorrect derivative for all other layers. The learning rate for the Xinner layers is normally set to some smaller value. To set a value for Xeta2, give two values in the `e' command as in: X X.ce Xe 0.1 0.01 X XTo set the derivative, use the `A' command as in: X X.ne4 X A dd * use the differential step size derivative (default) X A dF * use Fahlman's derivative in only the output layer X A df * use Fahlman's derivative in all layers X A do * use the original, correct derivative X X.ce XUpdate Methods X X The choices are the periodic (batch) method, the continuous (online) Xmethod and Jacob's delta-bar-delta method. The following commands Xset the update methods: X X A uc * for the continuous update method X A ud * for the delta-bar-delta method X A up * for the original periodic update method (default) X XThe delta-bar-delta method uses a number of special parameters and these Xare set using the `d' command. Delta-bar-delta can be used with any of Xthe derivatives and the algorithm will find its own value of eta for Xeach weight. X X.ce XOther Algorithm Options X X The `b' command controls whether or not to backpropagate error for Xunits that have learned their response to within a given tolerance. The Xdefault is to always backpropagate error. The advantage to not Xbackpropagating error is that this can save computer time. This Xparameter can be set like so: X X A b+ * always backpropagate error X A b- * don't backpropagate error when close X X The `s' sub-command will set the number of times to skip a pattern Xwhen the pattern has been learned to within the desired tolerance. To Xskip 3 iterations, use "A s 3", to reset to not skip any patterns Xuse "A s 0". X X The `t' sub-command will take the given pattern (only one at a Xtime) out of the training set so that you can then train the other Xpatterns and test the network's response to this one pattern that was Xremoved. To test pattern 3, use "A t 3" and to reset to use all the Xpatterns use "A t 0". X X.ce X9. The Delta-Bar-Delta Method X X The delta-bar-delta method attempts to find a learning rate Xeta, for each individual weight. The parameters are the initial Xvalue for the etas, the amount by which to increase an eta that seems Xto be too small, the rate at which to decrease an eta that is Xtoo large, a maximum value for each eta and a parameter used in keeping Xa running average of the slopes. Here are examples of setting these Xparameters: X X.na X.nf Xd d 0.5 * sets the decay rate to 0.5 Xd e 0.1 * sets the initial etas to 0.1 Xd k 0.25 * sets the amount to increase etas by (kappa) to 0.25 Xd m 10 * sets the maximum eta to 10 Xd n 0.005 * an experimental noise parameter Xd t 0.7 * sets the history parameter, theta, to 0.7 X.ad X.fi X XThese settings can all be placed on one line: X X.ce Xd d 0.5 e 0.1 k 0.25 m 10 t 0.7 X XThe version implemented here does not use momentum. The symmetric Xversions, sbp and srbp do not implement delta-bar-delta. X X The idea behind the delta-bar-delta method is to let the program find Xits own learning rate for each weight. The `e' sub-command sets the Xinitial value for each of these learning rates. When the program sees Xthat the slope of the error surface averages out to be in the same Xdirection for several iterations for a particular weight, the program Xincreases the eta value by an amount, kappa, given by the `k' parameter. XThe network will then move down this slope faster. When the program Xfinds the slope changes signs, the assumption is that the program has Xstepped over to the other side of the minimum and so it cuts down the Xlearning rate, by the decay factor, given by the `d' parameter. For Xinstance, a d value of 0.5 cuts the learning rate for the weight in Xhalf. The `m' parameter specifies the maximum allowable value for an Xeta. The `t' parameter (theta) is used to compute a running average of Xthe slope of the weight and must be in the range 0 <= t < 1. The Xrunning average at iteration i, a\di\u, is defined as: X X.ce Xa\di\u = (1 - t) slope\di\u + ta\di-1\u, X Xso small values for t make the most recent slope more important than Xthe previous average of the slope. Determining the learning rate for Xback-propagation automatically is, of course, very desirable and this Xmethod often speeds up convergence by quite a lot. Unfortunately, bad Xchoices for the delta-bar-delta parameters give bad results and a lot of Xexperimentation may be necessary. If you have n patterns in the Xtraining set try starting e and k around 1/n. The n parameter is an Xexperimental noise term that is only used in the integer version. It Xchanges a weight in the wrong direction by the amount indicated when the Xprevious weight change was 0 and the new weight change would be 0 and Xthe slope is non-zero. (I found this to be effective in an integer Xversion of quickprop so I tossed it into delta-bar-delta as well. If Xyou find this helps, please let me know.) For more on Xdelta-bar-delta see "Increased Rates of Convergence" by Robert A. XJacobs, in \fINeural Networks\fR, Volume 1, Number 4, 1988. X X.ce X10. Recurrent Networks X X Recurrent back-propagation networks take values from higher level Xunits and use them as activation values for lower level units. This Xgives a network a simple kind of short-term memory, possibly a little Xlike human short-term memory. For instance, suppose you want a network Xto memorize the two short sequences, "acb" and "bcd". In the middle of Xboth of these sequences is the letter, "c". In the first case you Xwant a network to take in "a" and output "c", then take in "c" and Xoutput "b". In the second case you want a network to take in "b" and Xoutput "c", then take in "c" and output "d". To do this, a network Xneeds a simple memory of what came before the "c". X X Let the network be an 7-3-4 network where input units 1-4 and output Xunits 1-4 stand for the letters a-d. Furthermore, let there be 3 hidden Xlayer units. The hidden units will feed their values back down to the Xinput units 5-7, where they become input for the next step. To see why Xthis works, suppose the patterns have been learned by the network. XInputing the "a" from the first string produces some random pattern of Xactivation, p1, on the hidden layer units and "c" on the output units. XThe pattern p1 is copied down to units 5-7 of the input layer. Second, Xthe letter, "c" is presented to the network together with p1 now on Xunits 5-7. This will give "b" on the output units. However, if the "b" Xfrom the second string is presented first, there will be a different Xrandom pattern, p2, on the hidden layer units. These values are copied Xdown to input units 5-7. These values combine with the "c" to produce Xthe output, "d". X X The training patterns for the network can be: X X 1000 000 0010 * "a" prompts the output, "c" X 0010 hhh 0100 * inputing "c" should produce "b" X X 0100 000 0010 * "b" prompts the output, "c" X 0010 hhh 0001 * inputing "c" should produce "d" X Xwhere the first four values on each line are the normal input, the Xmiddle three either start out all zeros or take their values from the Xprevious values of the hidden units. The code for taking these values Xfrom the hidden layer units is "h". The last set of values represents Xthe output that should be produced. To take values from the third layer Xof a network, the code is "i". For the fourth and fifth layers (if they Xexist) the codes are "j" and "k". Training recurrent networks can take Xmuch longer than training standard networks and the average error can Xjump up and down quite a lot. X X.ce X11. The Benchmarking Command X X The main purpose of the benchmarking command is to make it possible Xto run a number of tests of a problem with different initial weights and Xaverage the number of iterations and CPU time for networks that Xconverged. A second purpose is to run a training set thru the network a Xnumber of times, and for each try, a test pattern or a test set can be Xchecked at regular intervals. X X A typical command to simply test the current parameters on a number Xof networks is: X X.ce XB g 5 m 15 k 1 r 1000 200 X XThe "g 5" specifies that you'd like to set the goal of getting 5 Xnetworks to converge but the "m 15" sets a maximum of 15 tries to Xreach this goal. The k specifies that each initial network will get Xa kick by setting each weight to a random number between -1 and 1. XThe "r 1000 200" portion specifies that you should run up to 1000 Xiterations on a network and print the status of learning every 200 Xiterations. This follows the normal run command and the second Xparameter defining how often to print the statistics can be omitted. XFor example, here is some output from benchmarking with the xor Xproblem: X X.na X.nf X[?!*AaBbCcdefHhiklmnOoPpQqRrSstWwx]? B g 5 m 5 k 1 r 200 X seed = 7; running . . . Xpatterns learned to within 0.10 at iteration 62 X seed = 7; running . . . X seed = 7; running . . . Xpatterns learned to within 0.10 at iteration 54 X seed = 7; running . . . Xpatterns learned to within 0.10 at iteration 39 X seed = 7; running . . . Xpatterns learned to within 0.10 at iteration 44 X1 failures; 4 successes; average = 49.750000 0.333320 sec/network X.ad X.fi X XThe time/network includes however much time is used to print messages Xso to time the process effectively, all printing should be minimized. XThe timing is done using the UNIX clock(3C) function and on a UNIX PC Xat least, the time returned by clock will overflow after 2147 seconds, or Xabout 36 minutes. If you system has the same limitation, take care that XALL of the benchmarking you do in a single call of the program adds up Xto less than 36 minutes. X X In the above example, the seed that was used to set the random values Xfor the weights was set to 7 (outside the benchmarking command), however Xif you set a number of seeds, as in: X X.ce Xs 3 5 7 18484 99 X Xthe seeds will be taken in order for each network. When there are more Xnetworks to try than there are seeds, the random values keep coming from Xthe last seed value, so actually you can get by using a single seed. XThe idea behind allowing multiple seeds is so that if one network does Xsomething interesting you can use that seed to run a network with the Xsame initial weights outside of the benchmarking command. X X Once the benchmarking parameters have been set, it is only necessary Xto include the run portion in order to start the benchmarking process Xagain, thus, "B r 200" will run benchmarking again using the current set Xof parameters. Also, the parameters can be set without starting the Xbenchmarking process by just not including the `r' parameters in the B Xcommand as in: X X.ce XB g 5 m 5 k 1 X X In addition to getting data on convergence, you can have the program Xrun test patterns thru the network at the print statistics rate given Xin the `r' sub-command. To specify the test file, test100, use: X X.ce XB t f test100 X XTo run the training data thru for up to 1000 iterations and test every X200 iterations use: X X.ce XB r 1000 200 X XIf the pattern format specification p is set to either c or g you will Xget a summary of the patterns on the test file. If p is either C or G Xyou will get the results for each pattern listed as well as the summary. XTo stop testing the data on the data file, use "B t 0". X X Sometimes you may have so little data available that it is difficult Xto separate the patterns into a training set and a test set. One Xsolution is to remove each pattern from the training set, train the Xnetwork on the remaining patterns and then test the network on the Xpattern that was removed. To remove a pattern, say pattern 1 from the Xtraining set use: X X.ce XB t 1 X XTo systematically remove each pattern from the training set, use a Xdata file with the following commands: X X.na X.nf XB t 1 r 200 50 XB t 2 r 200 50 XB t 3 r 200 50 X ... etc. X.ad X.fi X Xand the pattern will be tested every 50 iterations. If, in the course Xof training the network, all the patterns converge, the program will Xprint out a line starting with a capital S followed by a test of the Xtest pattern. If the programs hits the point where statistics on the Xlearning process have to be printed and the network has not converged, Xthen a capital F will print out followed by a test of the test pattern. XTo stop this testing, use "B t 0". X X It would be nice to have the program average up and tabulate all the Xdata that comes out of the benchmarking command, but I thought I'd leave Xthat to users for now. You can use the record command to save the Xoutput from the entire session and then run it thru some other program, Xsuch as an awk program in order to sort everything out. X X.ce X12. Miscellaneous Commands X X Below is a list of some miscellaneous commands, a short example of Xeach and a short description of the command. X X.IP " ! !cmd " 15 XAnything after `!' will be passed on to UNIX as a command to execute. X X.IP " C C " 15 XThe C command will clear the network of values, reset the number of Xiterations to 0 and reset other values so that another run can be made. XThe seed value is reset so you can run the same network over with the Xsame initial weights but with different learning parameters. Even Xthough the seed is reset, the weights are not initialized, so you Xmust do this step after clearing the network. X X.IP " i i f " 15 XEntering "i f" will read commands from the file, f. When there are Xno more commands on the file, the program resumes reading from the Xprevious file being used. You can also have an `i' command within the Xfile f, however the depth to which you can nest the number of active Xfiles is 4 and stdin itself counts as the first one. Once an input Xfile has been specified, you can simply type "i" to read from the file Xagain. X X.IP " l l 2 " 15 XEntering "l 2" will print the values of the units on layer 2, Xor whatever layer is specified. X X.IP " T T -3 " 15 XIn sbp and srbp only, "T -3" sets all the threshold weights Xto -3 or whatever value is specified and freezes them at this value. X X.IP " W W 0.9 " 15 XEntering "W 0.9" will remove (whittle away) all the weights with Xabsolute values less than 0.9. X.in-15 X XIn addition, when a user generated interrupt occurs (by typing DEL) Xthe program will drop its current task and take the next command from Xthe keyboard. X X.ce X13. Limitations X X Weights in the bp and sbp programs are 16-bit integer weights, where Xthe real value of the weight has been multiplied by 1024. The integer Xversions cannot handle weights less than -32 or greater than 31.999. XThe weight changes are all checked for overflow but there are other Xplaces in these programs where calculations can possibly overflow as Xwell and none of these places are checked. Input values for the integer Xversions can run from -31.994 to 31.999. Due to the method used to Ximplement recurrent connections, input values in the real version are Xlimited to -31994.0 and above. END_OF_FILE if test 43266 -ne `wc -c <'README'`; then echo shar: \"'README'\" unpacked with wrong size! fi # end of 'README' fi echo shar: End of archive 1 \(of 4\). cp /dev/null ark1isdone MISSING="" for I in 1 2 3 4 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 4 archives. rm -f ark[1-9]isdone else echo You still need to unpack the following archives: echo " " ${MISSING} fi ## End of shell archive. exit 0 exit 0 # Just in case...