




                          CHAPTER  2

                      MANAGING YOUR DATA  



  The ability of the SPPC program to input and process very large
raw-score  and  summary  data  files  represents  one of its more 
powerful features.  This chapter of the  User's  Manual  explains
how you may create data files by using your own word processor or
by using the "Data Manager" that accompanies the program.


DATA INPUT   

  Although  many of the procedures in the SPPC allow you to enter 
data from the keyboard, its real power arises from the fact  that
it will process truly large data files.  If you have a very large
number  of  cases  you may wish to store your data in two or more 
files having different names.  If you wish to do that,  the  SPPC
modules  that accept raw data files will also accept continuation 
files.  You may use as many continuation files as  you  wish  and
they may continue over two or more diskettes.  

  Should you choose to use continuation files,  it  is  ESSENTIAL
that  they each have exactly the same format or structure.  While 
each may have a different number of 'cases', each MUST  have  the
same  number  of  variables  in the same order and possessing the
same names.


MISSING VALUES 

  Each  of  the  SPPC  modules  that processes raw data from disk 
files will also permit the processing of a missing  values  code.
The  missing  values  code  that you select MUST be a positive or
negative real number.  It cannot be a character.  For example, if
your data file contains both positive and negative values you may 
wish to use as your missing  values  code  some  extremely  large
value  that will always exceed the value of valid numeric data -- 
say, 1.0E88 (entered as 1e88).  On the other hand,  if  you  know
that  all  your  valid  data values are positive numbers, you may 
wish to use as your missing values code a number  as  simple  as,
say,  -1.   You  may  use  any real number as your missing values
code.  


CREATING DATA FILES

  An  especially  convenient  feature of the SPPC arises from the
fact that you may create your data files by using any text editor 
or word processing program that stores text files as  true  ASCII
data.   Or,  you  can  create  data files by using the procedures 
available through the 'Manage data' option of  the  program.   If
you  choose  to  create data files by using a text editor or word 
processing program, be sure  it  does  not  store  your  file  in
document  form.   That  is, the data file must be in strict ASCII
format.

  The next two sections of this chapter explain how to create raw
and  summary  data files using your word processor.  This is done 
so that you may use your word processor  and  so  that  you  will
understand  how  both  raw and summary data files are structured. 
Once these topics are  covered,  the  remainder  of  the  chapter
explains how to use the 'Manage data' option of the program.


CREATING A RAW SCORE DATA FILE WITH YOUR WORD PROCESSOR 

  1.  The first line of any input data file  MUST  consist  of  a
'header'  statement  that  has  at least one non-blank character. 
Ordinarily this first line (up  to  80  characters)  is  used  to
document  the file. It may contain any message that you wish.  If 
the first line of your file is completely blank,  the  file  will
not  be  read.   However,  blank  lines  may  be used to separate
subsequent data lines.   

  2.  The next line of the raw data input file must contain  four
entries:  the  sample  size,  the  number of variables, the 'data 
type', and the missing values code.  'Data type' must  equal  'r'
or 'R' for a raw score data file.  

  For  example,  if  you were to enter a raw data file based on a 
sample size of 308, with 19 variables and a missing  values  code
of -1, the second line of the input file would be shown as:

308 19 R -1

  These four 'values' may be placed anywhere on the line provided
they  are  in  the  order  of 'Sample Size, Number of Variables', 
'Data Type' and 'Missing Value' and provided they  are  separated
by at least one blank space.  

  3. The next one or more lines of  your  input  data  file  MUST
contain names for your variables -- one name for each variable --
shown in the order in which the variables are entered.  Each name
may  consist of one to fifteen characters, but each variable must 
be named.  Each name must be separated  by  at  least  one  blank
space.   

  The lines following the variable names  must  contain  the  raw
score  values  of the variables.  Each value must be separated by 
at least one blank space.  The data line(s) for each case  should
begin  on  a  new line.  Each data value may have as many decimal
positions as desired, but no more than 15 will be used.

  The following is a complete example of a raw data file.  

 The LONGLEY Raw Data: Byte Magazine (Nov. 1983), pp. 560-570 
 16 7 R -1
 Employed GNP_Deflator  GNP  Unemployed  Armed_Forces  Population 
 Year 
 60.323   83.0  234.289  235.6  159.0  107.608  1947
 61.122   88.5  259.426  232.5  145.6  108.632  1948
 60.171   88.2  258.054  368.2  161.6  109.773  1949
 61.187   89.5  284.599  335.1  165.0  110.929  1950
 63.221   96.2  328.975  209.9  309.9  112.075  1951
 63.639   98.1  346.999  193.2  359.4  113.270  1952
 64.989   99.0  365.385  187.0  354.7  115.094  1953
 63.761  100.0  363.112  357.8  335.0  116.219  1954
 66.019  101.2  397.469  290.4  304.8  117.388  1955
 67.857  104.6  419.180  282.2  285.7  118.734  1956
 68.169  108.4  442.769  293.6  279.8  120.445  1957
 66.513  110.8  444.546  468.1  263.7  121.950  1958
 68.655  112.6  482.704  381.3  255.2  123.366  1959
 69.564  114.2  502.601  393.1  251.4  125.368  1960
 69.331  115.7  518.173  480.6  257.2  127.852  1961
 70.551  116.9  554.894  400.7  282.7  130.081  1962


CREATING A SUMMARY DATA FILE USING YOUR WORD PROCESSOR

  1.  The first line of any input file MUST consist of a 'header'
statement   that  contains  at  least  one  non-blank  character. 
Ordinarily this first line (up  to  80  characters)  is  used  to
document  the file. It may contain any message that you wish.  If 
the first line of your file is completely blank,  the  file  will
not  be  read.   However,  blank  lines  may  be used to separate
subsequent data lines. 

  2.  The next data line of the input  file  must  contain  three
entries:  the sample size, the number of variables, and the 'data 
type'.  'Data type' must equal 's' or  'S'  for  a  summary  data
file.

  For  example, if you were to enter a summary data file based on 
a sample size of 308 and having 19 variables, the second line  of
the input file would be shown as:

308 19 S

  These  three  'values'  may  be  placed  anywhere  on  the line 
provided they are  in  the  order  of  'Sample  Size,  Number  of
Variables', and 'Data Type' and provided they are separated by at
least one blank space.

  3.  The  next  one  or  more lines of your input data file MUST
contain names for your variables -- one name for each variable -- 
shown in the order in which the variables are entered. Each  name
may  consist of one to fifteen characters, but each variable must 
be named.  Each name must be separated  by  at  least  one  blank
space.

  4.   The  next one or more lines of your summary data file must
contain the variable means (one for each variable), and each must
be separated by at least one blank space. 

  5. The next one or more lines of your summary  data  file  must
contain the variable standard deviations (one for each variable),
and each must be separated by at least one blank space.  

  6.   The  next one or more lines of your summary data file must
contain all of the correlations among the variables. However, you 
must enter ONLY the upper triangular portion of  the  correlation
matrix.  Each correlation must be separated by at least one blank
space.

  You  should  note that whenever you use a summary data file All 
standard deviations MUST be unbiased estimates, i.e.,  they  must
be computed using N-1 in the denominator.

  All  means,  standard  deviations,  and  correlations should be 
entered with  as  many  decimal  positions  of  accuracy  as  are
available  to  you.  However, decimal positions beyond 15 will be
ignored. 

  The  following  is  an example of a complete summary data input
file. 


  This is the 'header' line for a summary data file.

  308 4 S 

  Depression  Self_Esteem  Income  Education 

  45.87  61.36  11842.34  15.6        (Means)
  11.67  8.98  897.34  3.6            (Standard Deviations) 

   1.0000  0.7436  0.1247  0.2178     (Correlations) 
           1.0000  0.1742  0.0921
                   1.0000  0.4634
                           1.0000


  The correlations in a summary file do not have to be entered in 
'triangular' patterns as  shown  in  the  previous  screen.   The
triangular  pattern  was presented to highlight the fact that you 
must enter  the  upper  triangular  portion  of  the  correlation
matrix.   The  above correlations could as well have been entered
as follows:

1.0000 0.7436 0.1247 0.2178 1.0000 0.1742  0.0921  1.0000  0.4634 
1.0000 




                   USING THE SPPC DATA MANAGER


  The  SPPC  Data  Manager  has  been  designed  to  present  and
manipulate your data file in the most natural method possible.  We
think that the data manager is closer to being a "data processer"
than a data manager. The remainder of this document will describe
the operation and special features of the data manager.

The data manager allows you to perform three main operations with
regard to your data file:
   
      Creating new raw data files
      Editing existing data files 
      Utility functions 

The description of each operation is detailed below.


CREATING NEW RAW DATA FILES 

  When you select this option of the main menu you will be  asked
to enter the header records information; the number of variables,
the  number of cases and the missing values code you wish to use.
You may recall that this is the same information and order as was
described to you at the beginning of this chapter.

  Next  you will be asked to enter a one line description of your
data file.  This line must be 80 characters or less in length and
contain at least one non blank character.  

  Following the input of the header record you will  be  prompted
for the name of your output file.  At this point you may redirect
the entire file to the printer or to any valid floppy diskette or
hard  disk  file.   File  names  must  conform  to DOS standards.
Examples of valid file names are

      A:DATAFILE
      B:MYFILE.FIL
      C:\DATA\PROJECT1.DAT
      \FILES\DATA\DATAFILE.DAT 

  All of the data files include with the  SPPC  have  as  a  file
extention  "DAT".   This  convention  serves to indicate what the
nature of the file is at a glance.    

  After you enter the name  of  your  output  file  you  will  be
prompted  for  the names of the variables you wish to use.  These 
variable names may  contain  upper  or  lower  case  alphanumeric
characters  in  addition  to  any printable characters.  Variable 
names or labels may be up to 16  characters  in  length.   It  is
ESSENTIAL to note that you should not include blank spaces in the
variable  names.  This is because the data manager as well as all 
of the other SPPC statistical modules use the blank character  as
a delimiter between variable names.  Should any of your variables
contain  blank  spaces  the  SPPC  will  more  than  likely  read
erroneous information and thus display erroneous results.

  Examples of valid variable names:

      GROSS_PRODUCT
      VARIABLE_ONE
      DOLLAR$

  The Data input screen will present the above information as you
have  entered it along the top of the video page and on the first 
three lines.  A case number will  be  indicated  along  the  left
margin  of the screen and the cursor will be positioned under the
first variable ready for input of the data.  


Data Input Screen  

  The input of data into your data file is accomplished  using  a
special  screen.   This  screen prints five variables across each 
screen and allows for 15  cases  down.   Should  your  data  file
contain  more  than  5 variables the data manager will scroll the
screen right or down to accommodate the input of data.  


Order Of Data Input 

  Once the header records are completed and the data input screen 
is presented to you, you must enter your  data  in  a  "casewise"
fashion.  That is, you fill in the data file case by case for all
variables  in  you new data file.  After you enter a full case of 
data the data manager will highlight the next case  and  you  may
continue entering you data.


EDITING EXISTING DATA FILES

  Selecting this option from the main menu will initiate the data
manager into its' edit mode.  Once initiated you will be prompted 
for the name of the data file  you  wish  to  edit.   Valid  file
names,  directories  and paths were discussed above.  Type in the 
name of the data file you wish to edit  and  press  the  carriage
return  key.   The  data manager will read the header records and 
ask you to confirm your selection of the data file.  If  this  is
not  the  data file you intended to access, press 'N' and reenter
the name of the data file you would like to edit.  

  If you enter an incorrect data  file  name  or  path  the  data
manager will tell you that it cannot find the indicated file.  In
this case you may need to insert the correct data diskette, check
the  path  or directory of the file you wish and reenter the name
of the correct data file.

  After selecting the correct the data file you will be presented
with another menu containing the following edit options.


                   Add variables              
                   Change variable names      
                   Delete Variables           
                   Edit data file             
                   Input new cases   
                   Select data file
                   Replace missing values  


  Moreover, the first two header records of the data file will be 
presented at the top of the screen.  The name of the  data  file,
including  drive  and path, are presented directly under the edit
file menu.

ADD VARIABLES

  The option to add variables to a data file consists of entering 
the name of the new variable and then entering the data  elements
for  the  new  variable for each case in the data file.  Although 
This  procedure  allows  for  one  new  variable  at  time,  this
procedure may be called repetedly.   

  Inputing data for an added variable is  identical  to  that  of
creating a new data file, discussed above.


CHANGE VARIABLE NAMES

  You  may  rename any or all variables in an existing data file. 
Merely use the arrow keys to select the variable  name  that  you
wish to change and type the new variable name.

  Should  you  start to rename a variable and haven't entered the 
change (pressed the return key) you may  "undo"  your  change  by
pressing the ESC key.  

  Once  you have made all the changes you wish to make and decide
to exit you will be asked if you wish to save the changes.


DELETING VARIABLES 

  You may delete up to n-1 variables in you data file.   Where  n
is the actual number of variables in you data file.  For example,
if your data file has 12 variables you may delete any 11 of them.

  In  order  to  delete  variables merely press 'y' or 'Y' to the 
those  variables  presented  to  you  for  deletion.   Note  that
pressing any other key will NOT delete that variable. 

  Once  you have made all the changes you wish to make and decide 
to exit you will be asked if you wish to save the changes.   This
is also true if you reach the end of your list of variables.  


EDITING A DATA FILE

  With  this  data editor you can alter the contents of your data 
set as easily as altering documents  with  your  word  processor.
The  descriptions  below will help you understand what you can do 
with the SPPC Data Manager and assist you with the  operation  of
the data manager. 


Moving about your data file  

  When  you  begin  an edit session the data manager will display 
your data in video pages of 15 cases and 5 variables at  a  time.
Should  your  data  file have fewer than 15 cases or fewer than 5
variables in you data file the data manager will present all data
within these boundaries.   

  When the editor is first  invoked  (brought  into  memory)  the
cursor  is  positioned  at the first variable and the first case. 
You may reposition the cursor over any  data  element  by  simply
using the cursor control keys on the numeric key pad.   

  For those data sets that have more than  5  variables  you  may
"page"  left  or "page" right by pressing Ctrl Left Arrow or Ctrl 
Right Arrow.  That is, hold down the control key  and  press  the
right  arrow key on the numeric key board (remember to toggle the 
NUM LOCK key to the appropriate state).  Likewise, page  left  by
holding  down  the  control  key and pressing the left arrow key. 
Also the screen is "scrolled" left  or  right  when  you  advance
beyond  the  boundaries of the screen by using the left and right 
cursor control keys.  Similarly, you may Ctrl End to get  to  the
last variable.  

  Important - the data manager will  quickly  page  down  through
your  data set but will not allow you to return to previous pages
once you display the next page of data.


Entering & Altering Data 

  Entering data  within  your  data  file  at  this  point  means
replacing  currently displayed values with new or correct values. 
To do this locate the cursor over the value you  wish  to  update
and  enter the new value.  Once you have typed the new or correct 
value at its' proper position you terminate  and  formally  enter
the data value by pressing the carriage return key.  

  The  data  manager  offers  a "last ditch" facility for editing
ease.  This is an "undo" function that you may incorporate if you 
type in an incorrect value but have NOT entered it into the  data
file.  Once you begin to enter a data value and discover that you
are not in the correct position or decide to retain the old value
simply press the ESC key and restore the previous data value.


Replacing Data With The Missing Value

  A  special  feature of the data manager is that you can replace 
data values with the pre-defined missing value at  a  single  key
stroke.   Position  the  cursor  over  the data value you wish to 
change, as described above, and  press  the  F1  key.   The  data
manager  will insert the missing values code listed in the header
block at the position you have indicated.


Help In The Data Manager 

  Pressing the '?' key or the ESC key within the edit mode of the
data  manager  will  present the submenu illustrated below.  This 
menu allows for further features of the  data  manager  that  can
only be invoked through the edit facility.


                    Help                    
                    Delete Cases            
                    Change Header Statement  

  Selecting the Help option will present you with the  description
of the remaining menu options presented below.


Deleting Cases  

  The  data  manager  makes  deleting  cases  a snap.  Select the 
Delete Cases option of the HELP menu and enter the  starting  and
ending  case  numbers  that  you  wish  to delete.  Note that the 
delete operation is inclusive.  That is, both the  starting  case
and  the  ending  case  will  be  deleted  along with those cases
between the end points.  For example, should you decide to delete 
cases 5 through 8 then the data manager will delete cases  5,  6,
7, and 8.  If one the other hand you wish to delete only one case
just  specify that case as both the starting case and ending case 
to delete. The data manager will not  allow  you  to  delete  the
entire display page of data. 

  The  data  manager  will  check  your  input for a valid range, 
renumber the remaining data cases and  return  you  to  the  edit
mode.


Change Header Statement 

  Within  the  edit  mode  of  the data manager you may alter the 
header statement any time and as many times as  you  like.   From
the  HELP  menu  select  the  Change  Header Statement option and 
change the header statement to read as you like.  You  may  abort
this  process  at  any time prior to pressing the carriage return 
key by pressing the  ESC  key.   This  will  restore  the  header
statement to its' previous state.  Only 80 characters are allowed
to describe your data file.


INPUT NEW CASES

  On  selection of the menu option, the data manager will ask you 
for the number of new cases you would like to enter.   Once  this
information  has  been completed the data manager will advance to 
the end of the data file and present  you  with  the  data  input
screen.  You may then enter the additional data cases you desired
to enter.

SELECT A DATA FILE

  This edit menu item allows you to change the data file you wish
to edit without returning to the master data manager screen.  

  Enter  the name of the new data file you wish to edit according
to the guidelines outlined above.


REPLACE MISSING VALUES 

  Selecting this option will allow you to replace all  occurances
of  the  missing  data  observations  with  a  new missing value. 
Simply enter the new missing values code  and  the  data  manager
will  alter  those  data  elements containing the old value.  The 
current number of data replacements made  will  be  displayed  on
screen.

  Once the data manager has completed searching and replacing the
old  missing  values  you  will  be  asked  to  confirm  the file
alteration before it is made permanent.


FILE UTILITIES 

  The remainder of this chapter will discuss the last option  of
the main menu of the data manager.  

  This  procedure  is  designed  to  aid  you in manipulating and 
viewing contents of your data disk through the use of 'DOS'  type
functions.   In  keeping  with the conventions set up by DOS this
procedure offers the following operations.

                       
                      Rename a file        
                      Copy a file          
                      Kill a file          
                      View a data file     
                      Select disk drive    
                      Print a data file    
                      Available diskspace  
                      Disk directory          


  Most of the operations listed above should be familiar  to  the
user  and  therefore  we  will  not  comment  on  all of the menu
options.  

Available diskspace

  Selecting  this item will display the number of bytes available
on the currently logged in drive.

Select disk drive   

  This option will allow you to log onto a different disk  drive.
You need only press the drives' designation letter.  For example,
if  you  are logged onto drive "A" and you wish to log onto drive
"B", press the letter "b" or "B".  

Disk directory 

  Selecting this option will allow you view the contents  of  any
valid  DOS  defined  hard disk or diskette path.  This command is
nearly identical to that which you would do at the DOS prompt.

Examples are:

   a:*.dat
   c:\data\*.fil
   \????files.*
   \data\ 

  A  word  of caution, the data manager will not restore the path 
or drive that the SPPC was invoked from, thus you are quite  able
to  "pull the rug" out from under yourself if you are not careful 
to restore the correct path or  drive  before  returning  to  the
SPPC's master menu.   

  One final note, like all of the SPPC modules the  data  manager
will  only  process  those  data  files  containing  200 or fewer 
variables.  However, the number of cases may  be  unlimited.   In
this  respect  you are only restricted to the size of your media. 
Moreover, the data  manager  does  not  directly  manipulate  the
original  data  file,  rather  it  will create a second data file 
where all edits or changes you make  are  recorded  there.   Once
your  editing  session  is completed the data manager will delete 
the original data file and rename the temporary data file.   This
is  done  so  that should you decide to abort the editing process 
you retain the original data  file.   Because  the  data  manager
operates  on  the  hard disk or diskette there are many reads and
writes performed and consequently the more variables in your data
file the longer it takes the data manager to process data.