MORE COMPLEX ANALYSIS OF VARIANCE

  This section briefly  introduces  a  simple  method  for  using
multiple  regression  as  a means of conducting N-way analysis of 
variance.  If you are not familiar with the use of regression  to
conduct  ANOVA  you should review the instructional materials for
'One-Way' and 'Two-Way Analysis of Variance' before proceeding. 

  The  conduct  of  N-way  ANOVA  is  largely an extension of the 
technique described for two-way ANOVA.  The essential  difference
is  that you now have one continuous dependent variable and three 
or more classification variables.  You therefore must create  new
variables   for  each  of  the  subclasses  within  each  of  the
classification variables.  This will be illustrated with a simple
example.

  Suppose  you  have a set of depression scores (DEP), as before, 
but you now have four classification variables.  The first  three
have  two  subclasses  and  the  fourth has three subclasses.  In 
traditional parlance this is  often  referred  to  as  a  2x2x2x3
crossed factorial design. 

  In this example, the first variable is gender,  the  second  is
marital  status,  the  third  is social status, and the fourth is
age.  As before, we create one special variable for each subclass 
within each of the classification variables.  We could  therefore
define these special subclass variables as shown below.

Depression       DEP  = Dependent Variable
Gender           G1 = Males,   G2 = Females
Marital Status   M1 = Single,  M2 = Other
Social Status    S1 = Lower,   S2 = Upper
Age              A1 = Young,   A2 = Middle,   A3 = Old 

  As  before,  each  subclass variable is scored as 1 to indicate 
that the person is a member of the subclass or as 0  to  indicate
that  the  person is NOT a member of the subclass.  Once you have 
defined the subclass variables and then coded  each  person,  you
must  then  eliminate  one  subclass  variable  for  each  of the 
classification variables.  For this example we  shall  choose  to
eliminate G2, M2, S2, and A3.  We therefore retain the
coded variables

DEP   G1   M1   S1   A1   A2

  Now  suppose  we have someone with a depression score of 37 who
is male, married, upper class and middle aged.  Such a person
would then be coded as

DEP   G1   M1   S1   A1   A2

 37    1    0    0    0    1 

  The  complete  coded  data  for  48  individuals  are  shown as
follows:


DEP G1 M1 S1 A1 A2     DEP G1 M1 S1 A1 A2     DEP G1 M1 S1 A1 A2

 36  1  1  1  1  0      11  1  0  1  0  0      44  0  1  0  0  1 
 37  1  1  1  1  0      13  1  0  1  0  0      43  0  1  0  0  1
 41  1  1  1  0  1      21  1  0  0  1  0      56  0  1  0  0  0
 44  1  1  1  0  1      23  1  0  0  1  0      59  0  1  0  0  0
 46  1  1  1  0  0      78  1  0  0  0  1      42  0  0  1  1  0
 48  1  1  1  0  0      76  1  0  0  0  1      44  0  0  1  1  0
 23  1  1  0  1  0      28  1  0  0  0  0      68  0  0  1  0  1
 26  1  1  0  1  0      31  1  0  0  0  0      71  0  0  1  0  1
 39  1  1  0  0  1      41  0  1  1  1  0      16  0  0  1  0  0
 38  1  1  0  0  1      42  0  1  1  1  0      18  0  0  1  0  0
 51  1  1  0  0  0      46  0  1  1  0  1      26  0  0  0  1  0
 54  1  1  0  0  0      49  0  1  1  0  1      28  0  0  0  1  0
 37  1  0  1  1  0      51  0  1  1  0  0      83  0  0  0  0  1
 39  1  0  1  1  0      53  0  1  1  0  0      81  0  0  0  0  1
 63  1  0  1  0  1      28  0  1  0  1  0      33  0  0  0  0  0
 66  1  0  1  0  1      31  0  1  0  1  0      36  0  0  0  0  0


  As  was  the  case with the two-way ANOVA, the foregoing values 
for the subclass variables  only  represent  the  so-called  main
effects  of  the  2x2x2x3 ANOVA model.  Thus, in order to compute 
the subclass variables that will represent the  interactions,  we
must form product terms.

  Since gender could interact with the remaining three variables,
we  must form the subclass variables for the two-way interactions
with gender by simply multiplying the G1 value with the remainder
of the subclass values to produce

    G1xM1   G1xS1   G1xA1   G1xA2 

  However, to simplify the notation and conserve space, we  shall
denote these as

    GM   GS   GA1   GA2


  Of course, marital status can also interact with social status
and age, so we form those products as

    M1xS1   M1xA1   M1xA2

or, more simply for notation purposes, 

    MS   MA1   MA2

  Finally, social status can interact with age so we form

    S1xA1   S1xA2

or, for notation convenience,

    SA1   SA2
 

  We have now accomodated the potential for all possible  two-way
interactions.   However,  there is also the possibility for three 
different three-way interactions and  one  four-way  interaction.
These are:

   gender x marital x social
   gender x marital x age
   marital x social x age
   gender x marital x social x age

  As  before,  these  interaction variables are created by simply 
producting the appropriate subclass variables.  These  are  shown
as follows without comment.


G1xM1xS1  or  GMS

G1xM1xA1  or  GMA1
G1xM1xA2  or  GMA2

M1xS1xA1  or  MSA1
M1xS1xA2  or  MSA2

G1xM1xS1xA1   or  GMSA1
G1xM1xS1xA2   or  GMSA2

  Thus, all the needed subclass variables for this analysis are:

DEP   G1 M1 S1 A1 A2   GM GS GA1 GA2   MS MA1 MA2   SA1 SA2 

GMS   GMA1 GMA2   GSA1 GSA2   MSA1 MSA2   GMSA1 GMSA2
 

  Space  makes  it  difficult  to display all of the raw data for 
this analysis.  However, if you wish to examine the complete  set
of  raw  data  for this example, use your favorite word processor 
and examine the file on your SPPC Disk 1  named  FOURWAYR.DAT  or
use the 'type' command from DOS to examine that file. 

  Better  yet,  run the program using the raw data option and use 
the FOURWAYR.DAT file.   Of  course,  you  can  also  choose  the
summary  data option of the program and then use the FOURWAYS.DAT
data file. 

  A partial solution for this example is shown below along with a 
limited  discussion  of  the  analysis.   For  further  excellent
discussion  of  this  approach  to  the solution of complex ANOVA
designs, you might wish to consult Chapter 5 of the Cohen & Cohen
text.

  In this example we show only the ANOVA Table and point out that
it  is  very  important  to  analyze  the  sources of variance by 
grouping the subclass variables  into  sets.   The  program  will
present an opportunity to do that.

  In  defining  sets  of  subclass  variables  it  was decided to 
examine the main effects of the model.   Thus,  the  first  three
sets  each  had  one  variable  (for  gender, marital status, and 
social class), and the fourth set had two variables to  represent
the main effect of age.

  This portion of the ANOVA Table is shown as follows:


                       ANALYSIS OF VARIANCE

MAIN EFFECTS:
                                            Hierarchical
   VARIANCE           SUM OF       MEAN       Step-down
    SOURCE       df   SQUARES     SQUARES      F-ratio     p <=
---------------  --  ----------- ----------- -----------  ------
Gender       G1   1    300.00000   300.00000   107.46300  0.0000
Marital      M1   1      0.75000     0.75000     0.26865  0.6146
Social Class S1   1      4.08333     4.08333     1.46269  0.2367

Age          A1   1   2460.37000  2460.37000   881.32800  0.0000
             A2   1   3321.13000  3321.13000  1189.66000  0.0000
For this set:     2   5781.50000  2890.75000  1035.49000  0.0000
 

  It was not known whether any of the two-way interactions  would
be  significant  and  we  had  no a' priori hypotheses concerning 
them.  Thus, the decision was  to  include  all  of  the  two-way
interactions  as  a  single  set.   Since  there  were  nine such 
interactions we indicated that the  fifth  set  would  have  nine
variables.   

  The portion of the ANOVA Table  that  deals  with  the  two-way
interactions is shown as follows:


TWO-WAY INTERACTIONS:
                                            Hierarchical
   VARIANCE           SUM OF       MEAN       Step-down
    SOURCE       df   SQUARES     SQUARES      F-ratio     p <=
---------------  --  ----------- ----------- -----------  ------
             GM   1      0.00000     0.00000     0.00000  0.9955
             GS   1      0.00000     0.00000     0.00000  0.9955
            GA1   1      0.00000     0.00000     0.00000  0.9955
            GA2   1      0.00000     0.00000     0.00000  0.9955
             MS   1    200.08300   200.08300    71.67160  0.0000
            MA1   1      3.37500     3.37500     1.20896  0.2822
            MA2   1   7021.13000  7021.13000  2515.03000  0.0000
            SA1   1   1276.04000  1276.04000   457.09000  0.0000
            SA2   1    105.12500   105.12500    37.65670  0.0000
For this set:     9   8605.75000   956.19400   342.51700  0.0000


  It was also not known whether any of the three-way interactions
would   be  significant  and  we  had  no  a'  priori  hypotheses 
concerning them.  Thus, the decision was to include  all  of  the
three-way  interactions  as a single set.  Since there were seven 
such interactions we indicated that  the  sixth  set  would  have
seven variables.  

  The  portion  of  the ANOVA Table that deals with the three-way
interactions is shown as follows:


THREE-WAY INTERACTIONS:
                                            Hierarchical
   VARIANCE           SUM OF       MEAN       Step-down
    SOURCE       df   SQUARES     SQUARES      F-ratio     p <=
---------------  --  ----------- ----------- -----------  ------
            GMS   1      0.00000     0.00000     0.00000  0.9955
           GMA1   1      0.00000     0.00000     0.00000  0.9955
           GMA2   1      0.00000     0.00000     0.00000  0.9955
           GSA1   1      0.00000     0.00000     0.00000  0.9955
           GSA2   1      0.00000     0.00000     0.00000  0.9955
           MSA1   1    222.04200   222.04200    79.53730  0.0000
           MSA2   1     10.12500    10.12500     3.62687  0.0658
For this set:     7    232.16700    33.16670    11.88060  0.0000


  The   following   shows   that  neither  of  the  two  four-way 
interactions were significant (they were isolated  by  indicating
that  the  seventh  set would have two variables).  Also shown is
the error term for the ANOVA model.

  Again,  the reader is reminded that an hierarchical analysis is 
very  powerful  in  dealing  with  unequal  and  disproportionate
subclass sample sizes.  In this example the subclass sample sizes
were  all  equal  so  that an ordinary orthogonal partition would
produce the same results as the hierarchical analysis shown here. 
Such would not be the case, however,  if  subclass  sample  sizes
were  unequal  and  disproportionate.   In  such  an  event,  the 
ordinary  orthogonal  partitioning   would   produce   misleading
results.


FOUR-WAY INTERACTION:
                                            Hierarchical
   VARIANCE           SUM OF       MEAN       Step-down
    SOURCE       df   SQUARES     SQUARES      F-ratio     p <=
---------------  --  ----------- ----------- -----------  ------
          GMSA1   1      0.00000     0.00000     0.00000  0.9955
          GMSA2   1      0.00000     0.00000     0.00000  0.9955
For this set:     2      0.00000     0.00000     0.00000  0.9999

          Error  24     67.00000     2.79167
          Total  47  14991.20000
