Logo_Unical Agostino Tarsitano
Data sets

 
Ruspini data set This is a standard example consisting of 75 two-dimensional points making up 4 natural groups including 23, 20, 17, 15 entities respectively. Each clustering technique ought to find them. 
Kaufman L. Rouseeauw P.L. (1990) p.100.Finding group in data. An introduction to cluster analysis. John Wiley &Sons. New York.
Lubishew data set 1 Measurements were made on six variables in the males of three species: Chaetocnema concinna, Ch. heikertingeri, and Ch. heptapotamica, The real composition of the groups is (21, 31, 22).
Lubishew A.A. (1962). On the use of discriminant function in taxonomy. Biometrics, vol. 18, 455-477.
Lubishew data set 2 Two cryptic species of the flea beetles genus Halticus were separated (19 specimen of H. oleracea  and 20 of H. carduorum) using 4 external characters. In this case the dispersion matrices are very different for the two types and it is doubtful whether it is even legitimate to suppose they have a common dispersion matrix. 
Lubishew A.A. (1962). On the use of discriminant function in taxonomy. Biometrics, vol. 18, 455-477.
Fossils data set Six variables were measured on each of nummulited specimens from Eocene Yellow Limestone formation of Northwestern Jamaica. According to Chernoff the entities divide into three distinct clusters: {40, 34, 13} with one or two specimen which can be regarded as singleton or borderline.  Algorithm "T" applied to the first  four principal components (accounting for 94.6% of the variability contained in the data set) provided perfect recovery of all the entities. It appears that the large cluster can be separated into subclusters, but their number is undeterminate.
Chernoff H. (1970):.Metric considerations in cluster analysis. Proceedings of the sixth Berkley symposium on mathematical statistics and probability, 621-629. UCLA Press, Berkley, CA.
Fisher Iris data There are 4 measurements on 50 plants from each of 3 species of Iris: setosa, versicolor, virginica. Algorithm T provides a separation into 3 clusters: (50,0,0); (0,48,1), (0,2,49) which agrees with the findings of Friedman and Rubin (1967) and Maronna and Jacovkis (1974). Three undecided cases is probably the best possible results with this data set (cf. Richards, 1972).
Pit serre vibonesi Esercitazione di statistica
Dati idrologici Esercitazione di statistica

 

under construction
 
 
 

Home Page