In the last three decades, machine learning research and practice have
focused on batch learning usually using small datasets. In batch learning, the
whole training data is available to the algorithm, which outputs a decision
model after processing the data eventually (or most of the times) multiple
times. The rationale behind this practice is that examples are generated at
random according to some stationary probability distribution. Most learners
use a greedy, hill-climbing search in the space of models. They are prone
to high-variance and overtting problems. Brain and Webb (2002) pointed
out the relation between variance and data sample size. When learning from
small datasets the main problem is variance reduction, while learning from
large datasets may be more eective when using algorithms that place greater
emphasis on bias management.
In most challenging applications, learning algorithms act in dynamic environments,
where the data are collected over time. A desirable property of
these algorithms is the ability of incorporating new data. Some supervised
learning algorithms are naturally incremental, for example k-nearest neighbors,
and naive-Bayes. Others, like decision trees, require substantial changes
to make incremental induction. Moreover, if the process is not strictly stationary
(as are most real-world applications), the target concept could gradually
change over time. Incremental learning is a necessary property but not suf-
cient. Incremental learning systems must have mechanisms to incorporate
concept drift, forgetting outdated data and adapting to the most recent state
of nature.