The purpose of this book is to provide up-to-date progress both in Multiple Criteria
Programming (MCP) and Support Vector Machines (SVMs) that have become powerful
tools in the field of data mining. Most of the content in this book are directly
from the research and application activities that our research group has conducted
over the last ten years.

Although the data mining community is familiar with Vapnik’s SVM [206] in
classification, using optimization techniques to deal with data separation and data
analysis goes back more than fifty years. In the 1960s, O.L. Mangasarian formulated
the principle of large margin classifiers and tackled it using linear programming.
He and his colleagues have reformed his approaches in SVMs [141]. In the 1970s,
A. Charnes and W.W. Cooper initiated Data Envelopment Analysis, where linear or
quadratic programming is used to evaluate the efficiency of decision-making units
in a given training dataset. Started from the 1980s, F. Glover proposed a number
of linear programming models to solve the discriminant problem with a small-size
of dataset [75]. Since 1998, the author and co-authors of this book have not only
proposed and extended such a series of optimization-based classification models
via Multiple Criteria Programming (MCP), but also improved a number of SVM
related classification methods. These methods are different from statistics, decision
tree induction, and neural networks in terms of the techniques of separating data.

When MCP is used for classification, there are two common criteria. The first
one is the overlapping degree (e.g., norms of all overlapping) with respect to the
separating hyperplane. The lower this degree, the better the classification. The second
is the distance from a point to the separating hyperplane. The larger the sum of
these distances, the better the classification. Accordingly, in linear cases, the objective
of classification is either minimizing the sum of all overlapping or maximizing
the sum of the distances. MCP can also be viewed as extensions of SVM. Under
the framework of mathematical programming, both MCP and SVM share the same
advantage of using a hyperplane for separating the data. With certain interpretation,
MCP measures all possible distances from the training samples to separating hyperplane,
while SVM only considers a fixed distance from the support vectors. This
allows MCP approaches to become an alternative for data separation.