CONDUCTING ONE-WAY ANALYSIS OF VARIANCE

  The  multiple  regression  program can be used to rather easily 
analyze data using the technique of  'analysis  of  variance'  or
ANOVA.   This  section  briefly explains how to conduct a one-way 
ANOVA and more complex ANOVA designs are described in  subsequent
sections.

  If  you  are  not  familiar  with  the conduct of ANOVA using a
multiple regression program, you may wish to consult Chapter 5 in 
the  text  by  Cohen,  J.   &  Cohen,   P.    'Applied   Multiple
Regression/Correlation  Analysis  for  the  Behavioral Sciences',
(2nd ed). Hillsdale, NJ: Lawrence Erlbaum, 1983.

  In  order to illustrate the conduct of a one-way ANOVA, suppose 
you have a set of data consisting of  scores  on  a  'depression'
scale  and  'marital  status'.   The  data  are shown in the next
screen.
 

    Depression           Marital           Marital Status
      Score              Status                 Code

       24                Single                   1
       20                Single                   1
       26                Single                   1
       11                Married                  2
       14                Married                  2
       15                Married                  2
       38                Divorced                 3
       34                Divorced                 3
       37                Divorced                 3
       11                Other                    4
       12                Other                    4
       09                Other                    4
       13                Other                    4


  As you can see from the data, the depression scores represent a 
continuous   variable   and   marital   status    represents    a
classification  variable that has four subclasses.  Each subclass
can be represented by a verbal descriptor, such as 'single' or it
can be represented by a numeral as shown in the column labeled as
'Marital Status Code'.  

  In order to conduct a one-way ANOVA, you will have to create  a
separate  variable  for each of the subclasses of marital status. 
Then, each of these new variables will be scored as either  0  or
1.  A score of 1 will mean that the individual is a member of the
particular subclass, and a score of 0 means the individual is not
a  member  of  the  subclass.   This  procedure  can  be used for 
virtually any classification variable.  That is, create  one  new
variable for each subclass and then score each of them as 0 or 1.

  For the above example,  we  shall  create  four  variables  for
marital status (one for each subclass) and we shall give them the
following names.

   MS1 = Single
   MS2 = Married
   MS3 = Divorced
   MS4 = Other

  Now,  the  first  person  in  the  sample  was single and had a 
depression score of 24.  Thus,  we  will  represent  that  person
using the scores of

    Depression
      Score       MS1     MS2     MS3     MS4

       24          1       0       0       0


  By  so  scoring everyone in the sample, we obtain the following
data which will be used to conduct the one-way ANOVA.

    Depression     Marital Status Variables
      Score       MS1     MS2     MS3     MS4

       24          1       0       0       0
       20          1       0       0       0
       26          1       0       0       0
       11          0       1       0       0
       14          0       1       0       0
       15          0       1       0       0
       38          0       0       1       0
       34          0       0       1       0
       37          0       0       1       0
       11          0       0       0       1
       12          0       0       0       1
       09          0       0       0       1
       13          0       0       0       1


  You are now ready to enter the above five variables and conduct 
a one-way ANOVA.  The depression scores will be  treated  as  the
dependent  variable and the four marital status variables will be
treated as the independent variables.


  THERE IS ONE VERY IMPORTANT PROBLEM THAT MUST BE DEALT WITH!!

  When  you conduct the regression analysis, it is ESSENTIAl that 
one of the above marital status variables be  deleted.   It  does
not  matter,  mathematically,  which one you delete -- but one of 
them MUST be deleted.  You can  handle  that  problem  by  either
entering  only  three  of them or you can enter all four and then
delete the one you prefer.  

  There are  good  reasons  for  entering  all  of  the  subclass
variables  and  then  deleting one of them.  We shall explain why
after analyzing the data.  

  If you wish to analyze these data, you can choose the raw  data
option and then use the file on the SPPC Disk 1 that has the name
of  ONEWAYR.DAT or you can choose the summary data option and use 
the file named ONEWAYS.DAT.  In either case you must  delete  one
of the 'marital status' (MS) variables. 

  A  partial  analysis  of  the  data  is shown in below.  In the 
following analysis the decision was to delete the fourth  marital
status variable, MS4.

  Make  a note of the b-coefficients obtained from this solution. 
They are important  and  useful  in  the  interpretation  of  the
analysis.

            SIMULTANEOUS MULTIPLE REGRESSION RESULTS

               Raw Score b-  Standardized
Variable Name  Coefficients      Beta       t-ratio      p <=
-------------  ------------  ------------  ----------  --------
    Intercept      11.25000
          MS1      12.08330       0.50584     7.09501    0.0001
          MS2       2.08333       0.08721     1.22328    0.2514
          MS3      25.08330       1.05007    14.72830    0.0000


  As  you  saw  from  the  foregoing screen, the intercept of the 
regression model has a value of 11.25.  If you will  compute  the
mean  depression  score  for the fourth marital subclass you will
discover it to be exactly equal to 11.25.  That is no accident. 

  Whenever you use this coding scheme for  the  subclasses  of  a
classification  variable,  the  intercept of the regression model 
will ALWAYS be exactly equal to the mean score for  the  subclass
variable that you delete.

  Now compute the mean depression score for the 'single' persons.
You  will  find  it  to be 23.3333.  Now subtract the mean of the 
deleted subclass from the  mean  for  the  single  persons.   The
difference is 23.3333 - 11.25 = 12.0833 which is identical to the
b-coefficient for the 'single' subclass variable, MS1.

  In  short,  the b-coefficients will ALWAYS be identically equal 
to the difference between the subclass means and  the  intercept.
Stated  differently, each b-coefficient is a simple contrast that 
is computed as the difference between the retained subclass  mean
and the deleted subclass mean.  In this case we have

   b0 = MS4        (Because we deleted the fourth subclass)
   b1 = MS1 - MS4
   b2 = MS2 - MS4
   b3 = MS3 - MS4

  As  an  exercise,  re-analyze  the  data  using ONEWAYS.DAT and 
delete MS3.  If you do that you will see that the  b-coefficients
are defined (and computed) in terms of the depression score means
where

   b0 = MS3        (Because we deleted the third subclass)
   b1 = MS1 - MS3
   b2 = MS2 - MS3
   b3 = MS4 - MS3
 

  Below  you  will  see  the  analysis of variance table for this
example.  The traditional omnibus F-ratio is the one having three
degrees of freedom (for the set).

  The  single  degree  of  freedom  tests can be ignored for this 
analysis since none of them were 'planned'  contrasts.   However,
the  production  of  such  single degree of freedom tests is very
important for other types of analysis. 

  In these types of analysis  you  will  see  that  the  Multiple
Correlation  is  identical  to  the traditional Eta statistic and 
represents the degree of  'correlation'  between  depression  and
marital status.


                       ANALYSIS OF VARIANCE

                                            Hierarchical
 VARIANCE              SUM OF       MEAN       Step-down
  SOURCE        df    SQUARES     SQUARES        F-ratio   p <=
-------------  ---  ----------  ----------  ------------  ------
          MS1    1    35.70260    35.70260       7.18040  0.0242
          MS2    1   157.73300   157.73300      31.72290  0.0005
          MS3    1  1078.58000  1078.58000     216.92200  0.0000
For this set:    3  1272.02000   424.00600      85.27500  0.0000

        Error    9    44.75000     4.97222
        Total   12  1316.77000


                      SUMMARY STATISTICS

        Dependent Variable           =      Depression

        Omnibus F-ratio              =       85.275030
        Significance Level,       p <=        0.000020

        Multiple R                   =        0.982860
        Squared Multiple R           =        0.966015
        Shrunken R                   =        0.977080
        Shrunken Squared R           =        0.954687
        Determinant of Rxx           =        0.676000

        Regression Sum of Squares    =     1272.019000
        Error Sum of Squares         =       44.750000
        Standard Deviation of y'     =       11.888460
        Standard Error of Regression =        2.229848
 

  The  major  point  of the foregoing is that you can conduct any 
one-way ANOVA using this multiple regression  program.   You  may
have  up  to 50 subclasses in the classification variable and you 
can have as many cases as  you  like.   Unequal  subclass  sample
sizes will have no effect on the solution.  

  We recommend that you create one variable  for  every  subclass
even  though  it is then necessary to delete one of them in order 
to carry out the analysis.  The advantage is that  you  can  then
easily  define  or  re-define  the  contrasts  that  you  wish to 
interpret.  The omnibus F-ratio and Eta  statistic  will  not  be
affected by these choices but they can aid your interpretation of
the subclass means and their differences.
