Results 1 -
2 of
2
Unsupervised Topic Segmentation Based on Word Cooccurrence and Multi-Word Units for Text Summarization
- In Proceedings of the ELECTRA Workshop associated to 28th Annual International ACM SIGIR Conference
"... Topic Segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. In particular, Topic Segmentation is extensively used in Passage Retrieval and Text Summarization to provide more coherent results by taking into account raw document structure. However, most metho ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Topic Segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. In particular, Topic Segmentation is extensively used in Passage Retrieval and Text Summarization to provide more coherent results by taking into account raw document structure. However, most methodologies are based on lexical repetition that show evident reliability problems or rely on harvesting linguistic resources that are usually available only for dominating languages and do not apply to less favored and emerging languages. Moreover, most systems have been evaluated using Choi’s data set [1] which is biased for systems using mostly lexical repetition. As a consequence, these systems are not tested in real-world environments and their application may prove worst results than presented in the literature. In order to tackle all these drawbacks, we present an innovative Topic Segmentation system based on a new informative similarity measure based on word co-occurrences and evaluate it on a set of web documents within which Multiword Units have previously been identified.
Language Independent Methodologies to Tackle Multilinguality
"... Until now, Natural Language Processing (NLP) research development has mainly been conducted for the English speaking community. However, the European Union with its 25 member-states already involves 22 different official languages. As a consequence, multilinguality is certainly the most important ch ..."
Abstract
- Add to MetaCart
Until now, Natural Language Processing (NLP) research development has mainly been conducted for the English speaking community. However, the European Union with its 25 member-states already involves 22 different official languages. As a consequence, multilinguality is certainly the most important challenge of this century for the European NLP community. In this paper, we show how the Centre for Human Language Technology and Bioinformatics has been dealing with the problem of multilinguality by proposing language independent systems instead of language tailored architectures. 1.

