blog.grejpfrut.org

Archive for the ‘in english’ Category

Taming Text – review

without comments

ingersoll_cover150

Well, I have to write something… I’ve already surpassed the date when my review of Manning’s “Taming Text” (book by Grant Ingersoll and Thomas Morton) should be delivered, so lets try… :) Book is still under heavy development and at the moment of writing this, only four chapters (out of nine) are available. But “Taming Text” in its present shape is already attractive and contains a few juicy pieces for example a chapter dedicated to identification of people, places and things.

Among other things, one can find:

  • a description of basic concepts necessary to understand how to process information written in natural language,
  • information about problems associated with creation of effective full-text search engines,
  • issues associated with using keywords to tag content,
  • clustering text (this seems to be quite hot topic – it was also mentioned in “Algorithms of intelligent web”).

Authors emphasize that this book should deliver practical hints which would allow readers to develop their own applications. Final chapter will provide an complete example incorporating all features described in
the book. This is nothing new comparing to other books (like mentioned “Algorithms of intelligent web”) but I guess this is a praiseworthy kind of plagiarism :) . What’s also worth to mention, all examples are written in Java, using widely known OpenSource libraries like Lucene, Solr, OpenNLP. Both authors are active FLOSS Java programmers, Grant Ingersoll is a Lucene commiter, Thomas Morton is a lead developer and maintainer of OpenNLP project – so all information provided in this book comes straight from the source (code).

In domain of Natural Language Processing, computers are still far from real intelligence if you ever wonder what modern NLP can offer to ordinary programmer you may be interested in reading “Taming Text”. Book will be available in the beginning of 2010 so we have to be patient or try to get a draft from MEAP :) .

Written by admin

Czerwiec 2nd, 2009 at 10:24 am

Posted in in english

Tagged with ,

Algorithms of the intelligent web – review

without comments

marmanis_cover150 Thanks to MEAP and Poznań JUG I had a chance to read “Algorithms of the intelligent web” by Haralambos Marmanis and Dmitry Babenko. Content is organized into seven chapters, starting with general introduction which gives a broad overview of state-of-art in the field of modern web application. Second chapter offers a few bites of theory and finally practical example of building simple search engine. You can also find information about using classifiers, creation of recommendation systems and document clustering. Final chapter presents complete example of news portal which incorporate all introduced techniques in neat working solution.

Chapters from two to six have similar structure, starting with some theory necessary to understand presented concepts, then some clear examples presenting real word usage. Examples are extended with some additional more advanced features but everything is still perfectly understandable. Readers would learn how to adopt existing APIs (eg. digg.com), how to aggregate and transform content in order to create innovative mashups. After practical part, readers will find some notions about usage of presented solution in production. Authors describes common mistakes which leads to dead ends during implementation of modern intelligent web applications and this is definitely one of the biggest advantages of this book. What is also worth to mention, Marmanis and Babenko emphasize the role of quality of results and show general ways in which one can evaluate obtained outcome. At the end of each chapter readers can find TODOs, a section with tasks that maybe done in order to utilize presented solutions better.

All examples are delivered in BeansShell and Java. Nowadays, in the age of frameworks like Grails or Ruby on Rails the choice of BeanShell is quite unexpected. Examples in JRuby or Groovy could simplify adoption of presented solutions in real life web applications. But this is a minor thing, BeanShell is very similar to Java, so none Java developer should have problems with understanding examples. In MEAP-copy of book which I have evaluated there was also no information about how to run presented examples nor that knowledge about Java or BeanShell are required. I hope that would be improved in final release of book (from that what I’ve read in answer to my feedback those issues were addressed in final version). Authors presents quite a few open source libraries which can be easily use not only during creation of intelligent web applications but also in everyday work of Java developer.

What’s missing? I would love to read more about OpenSocial API which is only mentioned in first chapter of the book. Another thing that is missing are some references to so called Web 3.0, I’m constantly looking for a comprehensive overview of semantic web applications (eg. OpenCalais, Hakia). Creating a small semantic enabled application would definitely be a plus.

„Algorithms of the intelligent web” is definitely worth to recommend to all developers who want to gain knowledge about some useful Information retrieval and Machine learning techniques. Those techniques are presented in a very clear and understandable way. Book contains universal methods and algorithms, knowledge like this does not get old so fast (like for example web frameworks). I would definitely come back and read this book again.

Written by admin

Maj 19th, 2009 at 9:23 am

Posted in in english

Tagged with ,