Our primary motivation in writing this book is to share our working experience to bridge the gap between the knowledge of industry gurus and newcomers to the spoken language processing community. Many powerful techniques hide in conference proceedings and academic papers for years before becoming widely recognized by the research community or the industry. We spent many years pursuing spoken language technology research at Carnegie Mellon University before we started spoken language R&D at Microsoft. We fully understand that it is by no means a small undertaking to transfer a state of the art spoken language research system into a commercially viable product that can truly help people improve their productivity. Our experience in both industry and academia is reflected in the context of this book, which presents a contemporary and comprehensive description of both theoretic and practical issues in spoken language processing. This book is intended for people of diverse academic and practical backgrounds. Speech scientists, computer scientists, linguists, engineers, physicists and psychologists all have a unique perspective to spoken language processing. This book will be useful to all of these special interest groups.
Spoken language processing is a diverse subject that relies on knowledge of many levels, including acoustics, phonology, phonetics, linguistics, semantics, pragmatics, and discourse. The diverse nature of spoken language processing requires knowledge in computer science, electrical engineering, mathematics, syntax, and psychology. There are a number of excellent books on the sub-fields of spoken language processing, including speech recognition, text to speech conversion, and spoken language understanding, but there is no single book that covers both theoretical and practical aspects of these sub-fields and spoken language interface design. We devote many chapters systematically introducing fundamental theories needed to understand how speech recognition, text to speech synthesis, and spoken language understanding work. Even more important is the fact that the book highlights what works well in practice, which is invaluable if you want to build a practical speech recognizer, a practical text to speech synthesizer, or a practical spoken language system. Using numerous real examples in developing Microsoft’s spoken language systems, we concentrate on showing how the fundamental theories can be applied to solve real problems in spoken language processing.