Gael Dias, Elsa Alves, Jose Gabriel Pereira Lopes
In order to solve problems of reliability of systems based on lexical repetition and problems of adaptability of language-dependent systems, we present a context-based topic segmentation system based on a new informative similarity measure based on word co-occurrence. In particular, our evaluation with the state-of-the-art in the domain i.e. the c99 and the TextTiling algorithms shows improved results both with and without the identification of multiword units.
Subjects: 1.10 Information Retrieval; 13. Natural Language Processing
Submitted: Apr 24, 2007