![]() |
Agostino
Tarsitano
Data sets |
Ruspini data set | This is a standard example consisting of 75 two-dimensional points
making up 4 natural groups including 23, 20, 17, 15 entities respectively.
Each clustering technique ought to find them.
Kaufman L. Rouseeauw P.L. (1990) p.100.Finding group in data. An introduction to cluster analysis. John Wiley &Sons. New York. |
Lubishew data set 1 | Measurements were made on six variables in the males of three species:
Chaetocnema concinna, Ch. heikertingeri, and Ch. heptapotamica, The real
composition of the groups is (21, 31, 22).
Lubishew A.A. (1962). On the use of discriminant function in taxonomy. Biometrics, vol. 18, 455-477. |
Lubishew data set 2 | Two cryptic species of the flea beetles genus Halticus were separated
(19 specimen of H. oleracea and 20 of H. carduorum) using 4 external
characters. In this case the dispersion matrices are very different for
the two types and it is doubtful whether it is even legitimate to suppose
they have a common dispersion matrix.
Lubishew A.A. (1962). On the use of discriminant function in taxonomy. Biometrics, vol. 18, 455-477. |
Fossils data set | Six variables were measured on each of nummulited specimens from Eocene
Yellow Limestone formation of Northwestern Jamaica. According to Chernoff
the entities divide into three distinct clusters: {40, 34, 13} with one
or two specimen which can be regarded as singleton or borderline.
Algorithm "T" applied to the first four principal components (accounting
for 94.6% of the variability contained in the data set) provided perfect
recovery of all the entities. It appears that the large cluster can be
separated into subclusters, but their number is undeterminate.
Chernoff H. (1970):.Metric considerations in cluster analysis. Proceedings of the sixth Berkley symposium on mathematical statistics and probability, 621-629. UCLA Press, Berkley, CA. |
Fisher Iris data | There are 4 measurements on 50 plants from each of 3 species of Iris: setosa, versicolor, virginica. Algorithm T provides a separation into 3 clusters: (50,0,0); (0,48,1), (0,2,49) which agrees with the findings of Friedman and Rubin (1967) and Maronna and Jacovkis (1974). Three undecided cases is probably the best possible results with this data set (cf. Richards, 1972). |
Pit serre vibonesi | Esercitazione di statistica |
Dati idrologici | Esercitazione di statistica |