The rapidly growing amount of data, available from different technologies in the field of bio-sciences, high-energy physics, economy, climate analysis, and in several other scientific disciplines, requires a new generation of machine learning and statistical methods to deal with their complexity and heterogeneity. As data collections becomes easier, data analysis is required to be more sophisticated in order to extract useful information from the available data.
Even if data can be represented in several ways, according to their structural characteristics, ranging from strings, lists, trees to graphs and other more complex data structures, in most applications they are typically represented as a matrix whose rows correspond to measurable characteristics called features, attributes, variables, depending on the considered discipline and whose columns correspond to examples (cases, samples, patterns). In order to avoid confusion, we will talk about features and examples. In real-world tasks, there can be many more features than examples (cancer classification based on gene expression levels in bioinformatics) or there can be many more examples than features (intrusion detection in computer/network security). In addition, each example can be either labeled or not. Attaching labels allows to distinguish members of the same class or group from members of other classes or groups. Hence, one can talk about supervised and unsupervised tasks that can be solved by machine learning methods.
Since it is widely accepted that no single classifier or clustering algorithm can be superior to the others, ensembles of supervised and unsupervised methods are gaining popularity. A typical ensemble includes a number of classifiers/clusterers whose predictions are combined together according to a certain rule, e.g. majority vote.
Statistical, algorithmical, representational, computational and practical reasons can explain the success of ensemble methods. In particular several empirical results have demonstrated that ensembles often provide a better solution to the problem than any single method.