An academic dynasty has come together to write an excellent textbook on information retrieval.
Stefan Buttcher, Charles Clarke, and Gordon Cormack make up three generations of stellar
information retrieval researchers with over fifty years of combined experience. Buttcher was
Clarke's doctoral student, and Clarke was Cormack's doctoral student. All three are known for
their deep research insights and their passion for building real search systems, a rare combination
in a field rich with world-class researchers.
The book covers the major components of a search engine, from crawling to indexing to query
processing. Major sections are devoted to the core topics of indexing, retrieval methods, and
evaluation. Emphasis is put on implementation and experimentation, allowing the reader to
learn about the low-level details of information retrieval systems, including index compression
and index update strategies, and to understand which methods work well in practice. Two
chapters on evaluation provide the methodological and the statistical foundations for evaluating
search engines, enabling the reader to decide, for instance, whether a change to the search
engine's ranking formula has a positive effect on the quality of the search results. A chapter
on classification provides an introduction to machine learning techniques that are useful for
advanced search operations, such as limiting one's query to documents in a specific language,
or filtering offensive material from the search results. Chapters on parallel information retrieval
and on Web search describe the changes necessary to transform a basic IR system into a large-
scale search service covering billions of documents and serving hundreds or thousands of users
at the same time.
The authors provide a tutorial overview of current information retrieval research, with hun-
dreds of references into the research literature, but they go well beyond the typical survey.
Using a running set of examples and a common framework, they describe in concrete terms
the important methods underlying each component — why they work, how they may be imple-
mented, and how they may be shown to work. For the purpose of this book, the authors have
implemented and tested nearly every important method, conducting hundreds of experiments
whose results augment the exposition. Exercises at the end of each chapter encourage you to
build and explore on your own.
This book is a must-read for all search academics and practitioners!