ú INTRODUCTION. This program was written to compare two protein or DNA files. In case proteins are compared it can do this either straightforward by scoring for identical amino acids or by using one of the scoring-tables. Two of these tables are standard and unchangeable, three more that are available can be renamed and scores can be altered. The various options of this program will be discussed on the next pages. ú INDEX. Defaults.------------------------------------- 3 Extensions.----------------- 3 Windows and Scores.--------- 4 Scoring-tables.------------------------------- 5 Limits of DOTPLOT.---------------------------- 8 Size.----------------------- 8 Characters.----------------- 9 The picture and Panel.------------------------ 10 Picture.-------------------- 10 Zoom-in.----------- 10 Show Homology.----- 11 Panel.---------------------- 12 Shift.------------- 12 Expand.------------ 13 Change.------------ 13 Conditions.-------- 14 Parameters.-------- 15 Output.------------ 15 Another run.------- 16 ú Defaults. Extentions. If you choose to compare DNA files the program automatically will select files with extension ".DNA". This can be altered into any other letter combination by using the program: "DTPLT_ED.PRG". So if you are using DNA files with the ".SEQ" extention you can alter the default into "SEQ". The program will still search for a folder called "DNA" to find these files; this is not changeable. Protein files have the extension ".EIW", this also can be changed by using "DTPLT_ED.PRG". These files however must be located in a folder called "PROTEIN". Both the DNA and PROTEIN folder must be on the same drive as the DOTPLOT program is itself. ú Windows and scores. For the comparison of two files a window is slid along all possible diagonals of the matrix composed of all characters (amino acids or bases) of the files. If the score of the window exceeds a minimum value a line is drawn. It is proportional to both the window and the two files. For DNA the defaults are: a window of 21 and a minimum score of 14. Both these values can be changed with DTPLT_ED.PRG. In the DOTPLOT program itself other values can be given, changing the defaults will not hinder you to run the program with another set. For proteins the values are 8 and 5 for the window and the score respectively. These, of course, can be changed also by DTPLT_ED.PRG. The defaults for windows and scores of all five SCORING-TABLES (see below) can be changed in this manner. So can the names, comments and values of three of them. ú Scoring-tables. When DNA files are compared the scoring is fairly simple: for every identical base the score is incremented. This method is also available for proteins, but there are other options as well. Some amino acids are chemically more related than others; Glycine (R-H) is nearer to Alanine (R-CH3) than to Cysteine (R-CH2-SH). this can be scored either as fractions or as equality within a group. An other approach is to score for evolutionary relatedness. This means two processes have to be considered and expressed in a number. First, the chance of certain codon mutating into another has to be calculated and secondly, the fitness of this mutation has to be graded. Both the chance and the fitness then have to be expressed by a single number. ú Both the chemical and the evolutionary method are incorporated into DOTPLOT. The chemical scoring table is called "JIMENEZ". It does not score for individual amino acids but in groups. All amino acids within the group score equal (1), between groups they score 0. The groups are: PAGST neutral,weakly hydrophobic QNEDBZ hydrophilic, acid amine HKR hydrophilic, basic VILM hydrophobic FYW hydrophobic, aromatic C cross-link forming ú The evolutionary approach is represented by "DAYHOFF", this is a completely individual scoring-table. The relatedness of every amino acid with every other amino acid is expressed as a number between 0 and 2.73. There are three more tables available in DOTPLOT, the values in these tables as well as their names, defaults and comments can all be changed. So if you feel you have a improved scoring system you can change one of these tables to fit, complete with a appropriate name and defaults. ú Limits of DOTPLOT. Size. For version 5 of DOTPLOT a very fast subroutine was written (machine code), as always this means it will use a lot of memory. Large files would use more memory than is available even in a Meg4 computer. So in these cases several "tracks" are laid to make up the complete picture. You don't have to worry about the actual space it all uses; DOTPLOT will adjust the number of tracks to the available free memory. The maximum size of your files is 64 kbytes, this is only the number of bases or amino acids. The total file, including comments may be longer. ú Characters. When run in DNA-mode DOTPLOT will only recognize five characters: A,C,G,T,U ; so no characters like R,Y,S,W and the other nucleotide symbols can be assayed. However, they will not crash DOTPLOT. In protein mode all twenty-six capitals can be used. Any other character in the files will be ignored. For unknown bases use "N", for unknown amino acids "X". ú The Picture and Panel. When DOTPLOT is completed a control-panel is drawn over the right part of the screen. When the mouse is moved into this part of the screen a arrow is shown, in the left part the mouse is represented by a cross. Zoom-in and show-homology. Zoom-in. With the mouse a small part of the picture can be selected by positioning the cross, pressing the LEFT mouse button and moving to the right and down while keeping the button pressed. The selected part will then be boxed and this area will be blown up to full size. If the borders result in segments that are smaller than the window-size the box will jump back and there will be no harm done. ú Show Homology. When the cross is positioned over a stretch of similarity and the RIGHT mouse-button is pressed this stretch will be translated back into the two sequences and these two with their homologies clearly marked wil be shown. The similarity is scored according to the table you selected previously for the whole proteins. If you want to keep this output you can select for a printing or select the file-option to create an ASCII file for later processing with a word processor. To go back; either press the RIGHT mouse-button or click on exit. ú The panel. The panel is divided into different sections where likewise functions are grouped together. Borders. Divided into: Shift : When the arrows are black the picture shown is not made with the total length of the sequence file. With the arrows it is possible to shift in the direction indicated for half again the values given. This means that if you have selected a range for the horizontal sequence from 150 to 300 and you click on shift-rigth the new range in effect will be: 225 to 375. NO PICTURE IS MADE UNTIL THE EXECUTE COMMAND IS GIVEN!!!! ú Expand : Zoom-out, if only a part of the sequences is used. If the range selected is from 100 to 200 in both directions clicking on "2x" horizontal and on "2x" vertical will result in the range 50 to 250. If the "total" option is selected the whole sequence is used for the picture. NO PICTURE IS MADE UNTIL THE EXECUTE COMMAND IS GIVEN!!!! Change : When the Hor. or Vert. button is clicked you can swap the horizontal or the vertical sequence for a new one. Start and end of the new sequence can also be changed. ú Conditions : It depends on whether you are running DNA or protein files what the conditions are that you can change. In case of DNA it is : Reverse : This gives the possibility to change the orientation of one or both files you are currently running. In case you are running protein files the condition is : Homology : This gives you the possibility to change the scoring-table. The defaults that go with the new table are also loaded. ú Parameters : The window size and minimal score can be changed by clicking in the boxes with "<" or ">" in both cases the inner boxes will cause the value to be altered by one. The outer boxes will cause an alteration of 10%+1. IN EITHER CASE THERE IS NO PICTURE MADE UNTIL THE EXECUTE COMMAND IS GIVEN!!!! Output : Print : two options are available; portrait or landscape. Save : two kinds of files can be created; Degas and Doodle. ú Another run : Here the more extreme commands can be found: New run: start agin at the choice DNA/Protein. Quit : last resort if you can't imagine what there is left to compare. This will take you back to Gemdos. Christiaan Karreman KARREMAN@RULLF2.LeidenUniv.nl ú