The
Friedman-Rubin approach to cluster analysis
Summary
The essence of cluster analysis
is the estimation of an unknown parameter vector which maps the set of
entities to the set of cluster labels. This task can be fully accomplished
even if the approximation of the vector means and the variance-covariance
matrices of the various clusters are poorly estimated. In this sense, the
Friedman-Rubin approach offers a reanobale compromise between the flexibility
of the cluster shape and the number of unknown parameters to be estimated.
The present paper assumes
that the data set has clusters which tend to take the form of hyperellipsoid
of various size, but with the same orientations (in practice, the Mahalanobis
distance is used to measure the dissimiliatry between entities).
The Friedman-Rubin approach
has given satisfactory practical results although it is possible that for
a particular application it will perform poorly. However, both the initialization
methods and the stopping rules usually employed by k-means algorithms must
be adapted to this context.
The main result of this article
is the development of an efficient algorithm which is able to find a solution
which is seldom very far from the best solution also for uncertain data
sets. Moreover, several techniques to determine the starting partition
of the algorithm have been examined and adjusted to the minimization of
the determinant of the sample within-group dispersion matrix. Finally,
the best known stopping rules have analyzed and re-interpreted interms
of the Friedman-Rubin framework by by looking at various real and simulated
applications.
keywords
non hierarchical classification,
Mahalanobis distance, stopping rules, initialization methods, transfer
algorithms.
Back to Publications | Home page | Adobe pdf version | Adobe pdf version |
![]() |
![]() |
![]() Part 1 |
![]() Part 2 |