Results 1 -
3 of
3
Non-contiguous word sequences for information retrieval
- In Proceedings of the 42nd annual meeting of the Association for Computational Lingustics, Workshop on Multiword Expressions: Integrating Processing
, 2004
"... The growing amount of textual information available electronically has increased the need for high performance retrieval. The use of phrases was long seen as a natural way to improve retrieval performance over the common document models that ignore the sequential aspect of word occurrences in docume ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The growing amount of textual information available electronically has increased the need for high performance retrieval. The use of phrases was long seen as a natural way to improve retrieval performance over the common document models that ignore the sequential aspect of word occurrences in documents, considering them as “bags of words”. However, both statistical and syntactical phrases showed disappointing results for large document collections. In this paper we present a recent type of multi-word expressions in the form of Maximal Frequent Sequences (Ahonen-Myka, 1999). Mined phrases rather than statistical or syntactical phrases, their main strengths are to form a very compact index and to account for the sequentiality and adjacency of meaningful word co-occurrences, by allowing for a gap between words. We introduce a method for using these phrases in information retrieval and present our experiments. They show a clear improvement over the well-known technique of extracting frequent word pairs. 1
Tampere University of Technology at TREC 2001
- Proceedings of the Tenth Text REtrieval Conference (TREC 2001), NIST Special Publication 500-250
"... In this paper we present the prototype based text matching methodology used in the Routing Sub-Task of TREC 2001 Filtering Track. The methodology examines texts on word and sentence levels. On the word level the methodology is based on word coding and transforming the codes into histograms by the me ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we present the prototype based text matching methodology used in the Routing Sub-Task of TREC 2001 Filtering Track. The methodology examines texts on word and sentence levels. On the word level the methodology is based on word coding and transforming the codes into histograms by the means of Weibull distribution. On the sentence level the word coding is done in a similar manner as on the word level. But instead of making histograms we use a more simple method. After the word coding, we transform the sentence vectors to sentence feature vectors using Slant transform. The paper includes also description of the TREC runs and some discussion about the results. 1
Exploring Key Phrases for Browsing an Online
"... This paper describes ongoing work on how to automatically identify and use key phrases extracted from items of a news feed available on the Internet. These phrases are used for two di#erent tasks: users of mobile devices (e.g., cellular phones and personal digital assistants) will be able to sub ..."
Abstract
- Add to MetaCart
This paper describes ongoing work on how to automatically identify and use key phrases extracted from items of a news feed available on the Internet. These phrases are used for two di#erent tasks: users of mobile devices (e.g., cellular phones and personal digital assistants) will be able to subscribe to news in di#erent categories, where the categorisation of the news is based on the extracted phrases; and by browsing through small portions of the news items --- the phrases --- a user can decide whether an item is interesting without having to download the whole text.

