Bayesian networks have received a lot of attention over the last few decades from both scientists and engineers, and across a number of fields, including artificial intelligence (AI), statistics, cognitive science, and philosophy.
Perhaps the largest impact that Bayesian networks have had is on the field of AI, where they were first introduced by Judea Pearl in the midst of a crisis that the field was undergoing in the late 1970s and early 1980s. This crisis was triggered by the surprising realization that a theory of plausible reasoning cannot be based solely on classical logic [McCarthy, 1977], as was strongly believed within the field for at least two decades [McCarthy, 1959]. This discovery has triggered a large number of responses by AI researchers, leading, for example, to the development of a new class of symbolic logics known as non-monotonic logics (e.g., [McCarthy, 1980; Reiter, 1980; McDermott and Doyle, 1980]). Pearl’s introduction of Bayesian networks, which is best documented in his book [Pearl, 1988], was actually part of his larger response to these challenges, in which he advocated the use of probability theory as a basis for plausible reasoning and developed Bayesian networks as a practical tool for representing and computing probabilistic beliefs.
From a historical perspective, the earliest traces of using graphical representations of probabilistic information can be found in statistical physics [Gibbs, 1902] and genetics [Wright, 1921]. However, the current formulations of these representations are of a more recent origin and have been contributed by scientists from many fields. In statistics, for example, these representations are studied within the broad class of graphical models, which include Bayesian networks in addition to other representations such as Markov networks and chain graphs [Whittaker, 1990; Edwards, 2000; Lauritzen, 1996; Cowell et al., 1999]. However, the semantics of these models are distinct enough to justify independent treatments. This is why we decided to focus this book on Bayesian networks instead of covering them in the broader context of graphical models, as is done by others [Whittaker, 1990; Edwards, 2000; Lauritzen, 1996; Cowell et al., 1999]. Our coverage is therefore more consistent with the treatments in [Jensen and Nielsen, 2007; Neapolitan, 2004], which are also focused on Bayesian networks.