About me

Published by admin on 2008-12-29

Marco Campana is a software developer living in London. He received a master’s degree in Intelligent Web Technologies from Queen Mary University of London in 2008, and worked for companies like Yahoo! and Mobile Interactive Technology. In his spare time he contributes to research in the field of natural language processing and personalization. He will present his new paper “Incremental Personalised Summarisation with Novelty Detection” in FQAS ‘09 conference at the end of October ‘09 in Denmark. His interests are Ruby on Rails, iPhone SDK development and writing.

2 Comments »

  1. Hi, for the Post summarizer plugin, I’d like to use Chinese for the the plugin and have already created a “ZH” subdirectory. But I don’t understand what do you mean by:

    2. Create a class that extend the abstract class Document in lib/
    3. Implement the tokenize() and normalize() methods.
    4. Add config information to che config.php file

    Can you help me?

    Comment by Gracy — September 5, 2009 @ 5:12 pm
  2. Hi Gracy!

    If you want to use the summarizer plugin for posts written in Chinese you will have to do some programming. As described in the plugin documentation, you have to implement:

    1) the tokenize() method that is the method responsible for dividing the document in sentences.
    2) the normalize() method that is the method responsible for removing inflections from words (like gender, number, person and so on). It’s very likely you can find a stemming algorithm if you google it.

    The best thing to do is to have a look at the existing methods for the english language to understand how they works and then apply Chinese language rules to implement the two new methods.
    Hope it helps, enjoy!

    Comment by marco — September 8, 2009 @ 9:06 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2010 xterm.it | powered by WordPress based on Barecity