Clustering is an important unsupervised classification technique where a set of patterns,
usually vectors in multidimensional space, are grouped into clusters based
on some similarity or dissimilarity criteria. In crisp clustering, each pattern is assigned
to exactly one cluster, whereas in fuzzy clustering, each pattern is given a
membership degree to each class. Fuzzy clustering is inherently more suitable for
handling imprecise and noisy data with overlapping clusters. Clustering techniques
aim to find a suitable grouping of the input dataset so that some criteria are optimized.
Hence the problem of clustering can be posed as an optimization problem.
The objectives to be optimized may represent different characteristics of the clusters,
such as compactness, separation, and connectivity. A straightforward way to
pose clustering as an optimization problem is to optimize some cluster validity index
that reflects the goodness of the clustering solutions. All possible partitionings
of the dataset and the corresponding values of the validity index define the complete
search space. Traditional partitional clustering techniques, such as K-means and
fuzzy C-means, employ greedy search techniques over the search space to optimize
the compactness of the clusters. Although these algorithms are computationally efficient,
they often get stuck at some local optima depending on the choice of the initial
cluster centers.Moreover, they optimize a single cluster validity index (compactness
in this case), and therefore do not cover different characteristics of the datasets.
To overcome the problem of local optima, some global optimization tools such
as Genetic Algorithms (GAs) have been widely used to reach the global optimum
value of the chosen validity measure. GAs are randomized search and optimization
techniques guided by the principles of evolution and natural genetics, and have a
large amount of implicit parallelism. GAs perform multimodal search in complex
landscapes and provide near-optimal solutions for the objective or fitness function
of an optimization problem. ConventionalGA-based clustering techniques use some
validity measure as the fitness value. However, no single validity measure works
equally well for different kinds of datasets. Thus it is natural to simultaneously
optimize multiple such measures for capturing different characteristics of the data.