Home | Amazing | Today | Tags | Publishers | Years | Account | Search 
Taming Text: How to Find, Organize, and Manipulate It

Buy

Summary

Taming Text, winner of the 2013 Jolt Awards for Productivity, is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are built.

About this Book

There is so much text in our lives, we are practically drowningin it. Fortunately, there are innovative tools and techniquesfor managing unstructured information that can throw thesmart developer a much-needed lifeline. You'll find them in thisbook.

Taming Text is a practical, example-driven guide to working withtext in real applications. This book introduces you to useful techniques like full-text search, proper name recognition,clustering, tagging, information extraction, and summarization.You'll explore real use cases as you systematically absorb thefoundations upon which they are built.Written in a clear and concise style, this book avoids jargon, explainingthe subject in terms you can understand without a backgroundin statistics or natural language processing. Examples arein Java, but the concepts can be applied in any language.

Written for Java developers, the book requires no prior knowledge of GWT.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

Winner of 2013 Jolt Awards: The Best Books—one of five notable books every serious programmer should read.

What's Inside

  • When to use text-taming techniques
  • Important open-source libraries like Solr and Mahout
  • How to build text-processing applications
About the Authors

Grant Ingersoll is an engineer, speaker, and trainer, a Lucenecommitter, and a cofounder of the Mahout machine-learning project. Thomas Morton is the primary developer of OpenNLP and Maximum Entropy. Drew Farris is a technology consultant, software developer, and contributor to Mahout,Lucene, and Solr.

"Takes the mystery out of verycomplex processes."—From the Foreword by Liz Liddy, Dean, iSchool, Syracuse University

Table of Contents

  1. Getting started taming text
  2. Foundations of taming text
  3. Searching
  4. Fuzzy string matching
  5. Identifying people, places, and things
  6. Clustering text
  7. Classification, categorization, and tagging
  8. Building an example question answering system
  9. Untamed text: exploring the next frontier
(HTML tags aren't allowed.)

Life Changing: How Humans are Altering Life on Earth
Life Changing: How Humans are Altering Life on Earth

In this post-natural history guide, Helen Pilcher invites us to meet key species that have been sculpted by humanity.

We are now living through the post-natural phase, where the fate of all living things is irrevocably intertwined with our own. We domesticated animals to suit our needs, and altered their...

The Fall of the Roman Empire: Film and History
The Fall of the Roman Empire: Film and History
For over a century, epic cinema has significantly shaped popular interest in Roman history and culture. The Fall of the Roman Empire (1964), the last of the silver-screen epics about ancient Rome before Gladiator, stands out as the only epic that attempted, and largely succeeded, to show the greatness of Roman civilization rather...
SQL Server 2008 Administration in Action
SQL Server 2008 Administration in Action

SQL Server 2008 Administration isn't a standard SQL Server tutorial-there are dozens of those to choose from. Instead, this book breaks down the role of "SQL Server Administrator" into its key focus areas and tasks and details the techniques and best practices that make an administrator effective. In this book, a reader can...


Joomla! 1.6: A User's Guide: Building a Successful Joomla! Powered Website (3rd Edition)
Joomla! 1.6: A User's Guide: Building a Successful Joomla! Powered Website (3rd Edition)

Joomla is an open source content management system (CMS) that anyone can download for free (see forge.joomla.org/sf/go/projects.joomla/frs). This makes it an ideal choice for small businesses. Don’t let the price tag fool you, though; Joomla is powerful and robust, and more big organizations are choosing to use open source software...

The Real MCTS/MCITP Exam 70-620 Prep Kit: Independent and Complete Self-Paced Solutions
The Real MCTS/MCITP Exam 70-620 Prep Kit: Independent and Complete Self-Paced Solutions
Independent and unbiased self-paced study with fully integrated book, practice exams, and e-learning modules.

This exam is designed to validate proficieny supporting Windows Vista client. This exam will fulfill the Windows Vista Technology Specialist requirements of Exam 70-620.

The Microsoft Certified Technology
...
Mac OS X Leopard: The Missing Manual
Mac OS X Leopard: The Missing Manual
With Leopard, Apple has unleashed the greatest version of Mac OS X yet, and David Pogue is back with another meticulous Missing Manual to cover the operating system with a wealth of detail. The new Mac OS X 10.5, better known as Leopard, is faster than its predecessors, but nothing's too fast for Pogue and this Missing Manual. It's just one of...
©2021 LearnIT (support@pdfchm.net) - Privacy Policy