A. Tarsitano
Publications: Reports
 

The Friedman-Rubin approach to cluster analysis
 

Summary
The essence of cluster analysis is the estimation of an unknown parameter vector which maps the set of entities to the set of cluster labels. This task can be fully  accomplished even if the approximation of the vector means and the variance-covariance matrices of the various clusters are poorly estimated. In this sense, the Friedman-Rubin approach offers a reanobale compromise between the flexibility of the cluster shape and the number of unknown parameters to be estimated.
 The present paper assumes that the data set has clusters which tend to take the form of hyperellipsoid of various size, but with the same orientations (in practice, the Mahalanobis distance is used to measure the dissimiliatry between entities).
 The Friedman-Rubin approach has given satisfactory practical results although it is possible that for a particular application it will perform poorly. However, both the initialization methods and the stopping rules usually employed by k-means algorithms must be adapted to this context.
 The main result of this article is the development of an efficient algorithm which is able to find a solution which is seldom very far from the best solution also for uncertain data sets. Moreover, several techniques to determine the starting partition of the algorithm have been examined and adjusted to the minimization of the determinant of the sample within-group dispersion matrix. Finally, the best known stopping rules have analyzed and re-interpreted interms of the Friedman-Rubin framework by by looking at various real and simulated applications.
 

keywords
non hierarchical classification, Mahalanobis distance, stopping rules, initialization methods, transfer algorithms.

 
Back to Publications  Home page Adobe pdf version Adobe pdf version

Part 1 

Part 2