Help System (web edition)

DISCR <data>,L                                          /   M.Korhonen
Discriminant analysis:
In the discriminant analysis the observations (cases) are divided into
groups according to the values of a grouping variable. The grouping
variable may be at nominal scale or it has comparatively few distinct
values. The purpose of the analysis is to find such classification
functions that best characterize the differences between the groups.
These functions, which are linear combinations of the original
variables, are used for classifying new cases too.

The discriminant analysis usually has the following two phases:

(1) First the classification functions and tests associated with them
are computed.

(2) Second the cases of the original or another data are classified
according to these functions.
The analysis is succesful if few cases of the original data are
classified into wrong groups. However, we can get optimistic
results when classification function is used to classify the same
cases that were used to compute it. This bias may be reduced by
using cross validation in the classification or another data with
known groups. The classification may base on the classification
functions obtained from the discriminant analysis or on the
original observations.

The general form of the DISCR operation is the following:

    DISCR <data>,L
    <the definition of the variables in the model> 
    <options for the printout and methods used> 

The variables used for forming the classification (discriminant)
functions may be defined either by the VARIABLES specification or
they can be pointed by masks X or A.
Correspondingly, the grouping variable may be defined by the
GROUPING specification or by mask G. The grouping structure of
the grouping variable is given in the same way as in ANOVA  and
MEANS operations. If the structure is not given then the program
will examine the values of the grouping variable from the data file
and uses all distinct values found (which means one extra pass
through the data). Example:

DISCR FISHER,END+2
VARIABLES=sepallen,sepalw,petallen,petalw
GROUPING=iristype  iristype=1(setosa),2(versicol),3(virginic)
RESULTS=CROSS

The option CROSS in the RESULTS specification causes the printout of
the within and between groups crossproducts matrices. Alternatively,
covariances (COVA) or correlations (CORR) may be printed.

Further information:
  1 = Definitions for grouping variables 
  2 = Classification ot the cases 
  D = More on data analysis