Taming Text – review
Well, I have to write something… I’ve already surpassed the date when my review of Manning’s “Taming Text” (book by Grant Ingersoll and Thomas Morton) should be delivered, so lets try…
Book is still under heavy development and at the moment of writing this, only four chapters (out of nine) are available. But “Taming Text” in its present shape is already attractive and contains a few juicy pieces for example a chapter dedicated to identification of people, places and things.
Among other things, one can find:
- a description of basic concepts necessary to understand how to process information written in natural language,
- information about problems associated with creation of effective full-text search engines,
- issues associated with using keywords to tag content,
- clustering text (this seems to be quite hot topic – it was also mentioned in “Algorithms of intelligent web”).
Authors emphasize that this book should deliver practical hints which would allow readers to develop their own applications. Final chapter will provide an complete example incorporating all features described in
the book. This is nothing new comparing to other books (like mentioned “Algorithms of intelligent web”) but I guess this is a praiseworthy kind of plagiarism
. What’s also worth to mention, all examples are written in Java, using widely known OpenSource libraries like Lucene, Solr, OpenNLP. Both authors are active FLOSS Java programmers, Grant Ingersoll is a Lucene commiter, Thomas Morton is a lead developer and maintainer of OpenNLP project – so all information provided in this book comes straight from the source (code).
In domain of Natural Language Processing, computers are still far from real intelligence if you ever wonder what modern NLP can offer to ordinary programmer you may be interested in reading “Taming Text”. Book will be available in the beginning of 2010 so we have to be patient or try to get a draft from MEAP
.
