About me
Marco Campana is a software developer living in London. He received a master’s degree in Intelligent Web Technologies from Queen Mary University of London in 2008, and worked for companies like Yahoo! and Mobile Interactive Technology. In his spare time he contributes to research in the field of natural language processing and personalization. He will present his new paper “Incremental Personalised Summarisation with Novelty Detection” in FQAS ‘09 conference at the end of October ‘09 in Denmark. His interests are Ruby on Rails, iPhone SDK development and writing.




Hi, for the Post summarizer plugin, I’d like to use Chinese for the the plugin and have already created a “ZH” subdirectory. But I don’t understand what do you mean by:
2. Create a class that extend the abstract class Document in lib/
3. Implement the tokenize() and normalize() methods.
4. Add config information to che config.php file
Can you help me?
Hi Gracy!
If you want to use the summarizer plugin for posts written in Chinese you will have to do some programming. As described in the plugin documentation, you have to implement:
1) the tokenize() method that is the method responsible for dividing the document in sentences.
2) the normalize() method that is the method responsible for removing inflections from words (like gender, number, person and so on). It’s very likely you can find a stemming algorithm if you google it.
The best thing to do is to have a look at the existing methods for the english language to understand how they works and then apply Chinese language rules to implement the two new methods.
Hope it helps, enjoy!