SURVO MM Help System (web edition)

HCLUSTER <input>,<output_line>                    /    F. Åberg 11.5 1996

Performs hierarchical clustering of observations in the specified data
or on a distance matrix.
HCLUSTER let you plot a dendrogram on CRT or a Postscript printer.

When data is used, the variable with the label should be activated with
letter L, and the variables to compute the distances from with letter A.
If no L activated variable is found, the first activated variable is
used if it is of string type.
If no suitable label is found, the observation numbers are used as labels.
Note: the label variable must be of string (S) type.

The HCLUSTER module recognizes a distance matrix as input when the name
ends with the .MAT extension.
(eg. DIST and DISTV modules by S.Mustonen are useful for making
distance matrices.)

On next screen about various specifications.

Specification  Diffrent values      Abbrevation      Remarks
 METHOD         SINGLE_LINKAGE      SIN or 1          (default)
                COMPLETE_LINKAGE    COM or 2
                AVERAGE_LINKAGE     AVE or 3
                WEIGHTED_AVERAGE    WAV or 4
                CENTROID            CEN or 5
                WEIGHTED_CENTROID   WCE OR 6
                MINIMUM_VARIANCE    MIN or 7         Also called Wards method.
 SAVEDIST       <matrix>                             Default: no saving.
                <textfile>                           With extension .TXT
 DISTANCE       SQUARED_EUCLIDIAN   SQU or SQR or 1  (default)
                EUCLIDIAN           EUC or 2
                CITY_BLOCK          CIT OR 3
                CANBERRA_METRIC     CAN or 4
 TREEDATA       <datafile>                           Default: #TREE#
                                                     Used also for PS file.
 RESULTS        0..10                                Short output.
                >10                                  Long output.
 PLOT           PS  or  POSTSCRIPT                   Output for PostScript.
                PS,LANDSCAPE                         Print format: Landscape
more specifications on next screen.

 SCALING        YES  ( any value will do. )          Performs standardization
                                                     of variables before com-
                                                     puting distances.
                                                     zero mean, unit variance
 WEIGHTS        <weight matrix>                      Vector with weights.
                                                     Survo matrix; 1 column
                                                     m rows, in the same
                                                     order as the activated
                                                     variables.
examples on next screen.

HCLUSTER DECA,CUR+1    /  METHOD=MINIMUM_VARIANCE  SAVEDIST=MAT1
 The distance matrix is saved in matrix file MAT1.MAT. If n>90 then distances
 are saved as a text file MAT1.TXT

HCLUSTER D.MAT,CUR+1   /  TREEDATA=C:TMPTREE1  RESULTS=0
 Performs cluster analysis based on distance matrix D. The data that contains
 the dendrogram is saved in data file TREE1.SVO in current datapath.
 Only the lines relevant for plotting the dendrogram are as output.
 Note that TREEDATA and SAVEDIST can include a path name.

HCLUSTER MYDATA,CUR+1  /  DISTANCE=CIT  PLOT=PS  SAVEDIST=DIST1.TXT
 Uses method single linkage and the distances are CITY BLOCK measures.
 The dendrogram is 'printed' to a PostScript file. The name (and path)
 is the same as in TREEDATA but with the .PS extension.
 The distance matrix is saved as a textfile in DIST1.TXT (in datapath).
 Note that the distance matrix is not saved by default.

 More about HCLUSTER on next screen.

The HCLUSTER module uses an agglomerative algorithm.
Other distance measures can be used by making a
distance matrix with the DIST module.

Note that HCLUSTER only work with dissimilarity measures.

Literature used for programming the HCLUSTER module:
 Anderberg Michael R. : Cluster Analysis for Applications, NY & London, 1973
 Jain Anil K. : Algorithms for Clustering Data, 1988
 Everitt Brian S. : Cluster Analysis, 1983

 1 = More information on additional multivariate operations 
 M = More information on multivariate analysis 


More information on Survo from www.survo.fi
Copyright © Survo Systems 2001-2012.
webmaster'at'survo.fi