Syntax-Based Collocation Extraction is the first book to offer a comprehensive, up-to-date review of the theoretical and applied work on word collocations. Backed by solid theoretical results, the computational experiments described based on data in four languages provide support for the book’s basic argument for using syntax-driven extraction as an alternative to the current cooccurrence-based extraction techniques to efficiently extract collocational data. The work described in Syntax-Based Collocation Extraction focuses on using linguistic tools for corpus-based identification of collocations. It takes advantage of recent advances in parsing to propose a novel deep syntactic analytic collocation extraction that has applicability to a range of important core tasks in Computational Linguistics. The book is useful for anyone interested in computational analysis of texts, collocation phenomena, and multi-word expressions in general.
A large part of the vocabulary of a language is made up of phraseological units or multi-word expressions, complex lexical items that have “idiosyncratic interpretations that cross word boundaries” (Sag et al., 2002, 2). The importance of these units has been widely recognized both in theoretical linguistics, in which phraseology was recently established as an independent field of research (Cowie, 1998), and in computational linguistics, where growing attention is currently being paid to recognizing and processing multi-word units in various applications (Baldwin and Kim, 2010).