Results 1 -
7 of
7
An Analysis of Statistical and Syntactic Phrases
, 1997
"... As the amount of textual information available through the World Wide Web grows, there is a growing need for high-precision IR systems that enable a user to find useful information from the masses of available textual data. Phrases have traditionally been regarded as precision-enhancing devices and ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
As the amount of textual information available through the World Wide Web grows, there is a growing need for high-precision IR systems that enable a user to find useful information from the masses of available textual data. Phrases have traditionally been regarded as precision-enhancing devices and have proved useful as content-identifiers in representing documents. In this study, we compare the usefulness of phrases recognized using linguistic methods and those recognized by statistical techniques. We focus in particular on high-precision retrieval. We discover that once a good basic ranking scheme is being used, the use of phrases does not have a major effect on precision at high ranks. Phrases are more useful at lower ranks where the connection between documents and relevance is more tenuous. Also, we find that the syntactic and statistical methods for recognizing phrases yield comparable performance. 1 Introduction The amount of textual information available through the World Wide...
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval
, 1996
"... Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient nounphrase analysis t ..."
Abstract
-
Cited by 64 (10 self)
- Add to MetaCart
Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient nounphrase analysis techniques to create bet- ter indexing phrases for information retrieval. In particular, we describe a hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted sub- compounds improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.
Term proximity scoring for keyword-based retrieval systems
- In Proc. of the 25th European Conf. on IR Research
, 2003
"... Abstract. This paper suggests the use of proximity measurement in combination with the Okapi probabilistic model. First, using the Okapi system, our investigation was carried out in a distributed retrieval framework to calculate the same relevance score as that achieved by a single centralized index ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
Abstract. This paper suggests the use of proximity measurement in combination with the Okapi probabilistic model. First, using the Okapi system, our investigation was carried out in a distributed retrieval framework to calculate the same relevance score as that achieved by a single centralized index. Second, by applying a term-proximity scoring heuristic to the top documents returned by a keyword-based system, our aim is to enhance retrieval performance. Our experiments were conducted using the TREC8, TREC9 and TREC10 test collections, and show that the suggested approach is stable and generally tends to improve retrieval effectiveness especially at the top documents retrieved. 1
Experiments on Chinese Text Indexing ---CLARIT TREC-5 Chinese Track Report
"... Introduction The focus of the CLARIT TM1 Chinese Track Experiments is on investigating the effectiveness of different automatic indexing methods for retrieval over Chinese texts. In particular, we explored indexing using linguistic units (words, compound words, and phrases), single Chinese charac ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Introduction The focus of the CLARIT TM1 Chinese Track Experiments is on investigating the effectiveness of different automatic indexing methods for retrieval over Chinese texts. In particular, we explored indexing using linguistic units (words, compound words, and phrases), single Chinese characters, and overlapping character bigrams. In addition to fully automatic processing of queries, we ran experiments with manually constructed term vector queries supplemented by Boolean type constraints. The constraints were used for selecting documents for CLARIT automatic feedback or as a mean of refining the final set of retrieved documents [Mili'c-Frayling et al. 1997]. All the experiments were conducted using the CLARIT retrieval system [Evans & Lefferts 1995]. Since its current NLP component does not support the parsing of Chinese texts, we designed an appropriate parsing module and pre-processed the documents before submitting them for indexing and retrieval
Experiments in Query Optimization
- The Sixth Text REtrieval Conference (TREC-6
, 1998
"... this report, one CLARITECH linguist, and two non-technical volunteers. The 50 ad-hoc topics were divided up Non-Relevant ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this report, one CLARITECH linguist, and two non-technical volunteers. The 50 ad-hoc topics were divided up Non-Relevant
CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments
- In: D. Harman (Ed.) The Fifth Text REtrieval Conference (TREC-5). NIST Special Publication
, 1997
"... this paper we present a detailed description and analysis of the experiments with feedback control. In Section 3 we discuss the official TREC-5 experiments, CLTHES and CLCLUS. We summarize our findings in Section 4. The Appendix contains information about system parameters and experiments performed ..."
Abstract
- Add to MetaCart
this paper we present a detailed description and analysis of the experiments with feedback control. In Section 3 we discuss the official TREC-5 experiments, CLTHES and CLCLUS. We summarize our findings in Section 4. The Appendix contains information about system parameters and experiments performed in the TREC-5 Very Large Collection (VLC) track. 2. Experiments with Feedback Control
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval
, 1996
"... Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient nounphrase analysis t ..."
Abstract
- Add to MetaCart
Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient nounphrase analysis techniques to create bet- ter indexing phrases for information retrieval. In particular, we describe a hybrid approach to the extraction of meaningful (continuous or discontinuous) subcompounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted sub- compounds improves both recall and precision in an information retrieval system. The noun-phrase analysis techniques are also potentially useful for book indexing and automatic thesaurus extraction.

