Results 11 - 20
of
109
Selectively Using Relations to Improve Precision in Question Answering
- In Proceedings of the EACL-2003 Workshop on Natural Language Processing for Question Answering
, 2003
"... Despite the intuition that linguistically sophisticated techniques should be beneficial to question answering, real gains in performance have yet to be demonstrated empirically in a reliable manner. Systems built around sophisticated linguistic analysis generally perform worse than their linguistica ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
Despite the intuition that linguistically sophisticated techniques should be beneficial to question answering, real gains in performance have yet to be demonstrated empirically in a reliable manner. Systems built around sophisticated linguistic analysis generally perform worse than their linguistically-uninformed cousins. We believe that the key to effective application of natural language processing technology is to selectively employ it only when helpful, without abandoning simpler techniques. To this end, we identify two linguistic phenomena that current information extraction driven systems have difficulty with, and demonstrate how syntactic processing can help. By indexing syntactic relations that can be reliably extracted from corpus text and matching questions with documents at the relation level, we demon- strate that syntactic analysis enables a question answering system to successfully handle these phenomena, thereby improving precision.
An effective approach to document retrieval via utilizing wordnet and recognizing phrases
- In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
, 2004
"... Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different ..."
Abstract
-
Cited by 50 (9 self)
- Add to MetaCart
Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different and are determined using a decision tree. Phrases are more important than individual terms. Consequently, documents in response to a query are ranked with matching phrases given a higher priority. We utilize WordNet to disambiguate word senses of query terms. Whenever the sense of a query term is determined, its synonyms, hyponyms, words from its definition and its compound words are considered for possible additions to the query. Experimental results show that our approach yields between 23 % and 31% improvements over the best-known results on the TREC 9, 10 and 12 collections for short (title only) queries, without using Web data.
Improving Browsing in Digital Libraries with Keyphrase Indexes
, 1998
"... Browsing accounts for much of people's interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing words when people think in terms of topics, and returning documents when people want a broader view. As a ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
Browsing accounts for much of people's interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing words when people think in terms of topics, and returning documents when people want a broader view. As a result, users cannot easily determine what is in a collection, how well a particular topic is covered, or what kinds of queries will provide useful results. We have built
A Non-Invasive Learning Approach to Building Web User Profiles
, 1999
"... Introduction Recently researchers have started to make web browsers more adaptive and personalized. A personalized web browser caters to the user's interests and an adaptive one learns from the users' (potentially changing) access behavior. The goal is to help the user navigate the web. Lieberman's ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
Introduction Recently researchers have started to make web browsers more adaptive and personalized. A personalized web browser caters to the user's interests and an adaptive one learns from the users' (potentially changing) access behavior. The goal is to help the user navigate the web. Lieberman's Letizia [13] monitors the user's browsing behavior, develops a user profile, and searches for potentially interesting pages for recommendations. The user profile is developed without intervention from the user (but the details of how that is performed is not clear in [13]). While the user is reading a page, Letizia searches, in a breadth-first manner, from that location, pages that could be of interest to the user. Pazzani et al.'s Syskill & Webert [18, 19] asks the user to rank pages in a specific topic. Based on the content and ratings of pages, the system learns a user profile that predicts if pages are of interest to th
Term proximity scoring for keyword-based retrieval systems
- In Proc. of the 25th European Conf. on IR Research
, 2003
"... Abstract. This paper suggests the use of proximity measurement in combination with the Okapi probabilistic model. First, using the Okapi system, our investigation was carried out in a distributed retrieval framework to calculate the same relevance score as that achieved by a single centralized index ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
Abstract. This paper suggests the use of proximity measurement in combination with the Okapi probabilistic model. First, using the Okapi system, our investigation was carried out in a distributed retrieval framework to calculate the same relevance score as that achieved by a single centralized index. Second, by applying a term-proximity scoring heuristic to the top documents returned by a keyword-based system, our aim is to enhance retrieval performance. Our experiments were conducted using the TREC8, TREC9 and TREC10 test collections, and show that the suggested approach is stable and generally tends to improve retrieval effectiveness especially at the top documents retrieved. 1
Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German and Italian
- Evaluation of Cross-Language Information Retrieval Systems, CLEF 2001, volume 2406 of Lecture Notes in Computer Science
, 2001
"... This paper describes the experiments of our team for CLEF 2001, which includes both official and post-submission runs. We took part in the monolingual task, for Dutch, German, and Italian. The focus of our experiments was on the effects of morphological analyses such as stemming and compound spli ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
This paper describes the experiments of our team for CLEF 2001, which includes both official and post-submission runs. We took part in the monolingual task, for Dutch, German, and Italian. The focus of our experiments was on the effects of morphological analyses such as stemming and compound splitting on retrieval effectiveness. Confirming earlier reports on retrieval in compound splitting languages such as Dutch and German, we found improvements to be around 25% for German and as much as 69% for Dutch. For Italian, lexiconbased stemming resulted in gains of up to 25%. 1
Learning to Extract Keyphrases from Text
, 1999
"... Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphra ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphrases are useful, as we discuss in this paper. Recent commercial software, such as Microsoft's Word 97 and Verity's Search 97, includes algorithms that automatically extract keyphrases from documents. In this paper, we approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for this task. T...
Recognizing Acronyms and their Definitions
- ISRI (Information Science Research Institute) UNLV
, 1999
"... Abstract This paper introduces an automatic method for finding acronyms and their definitions in free text. The method is based on an inexact pattern matching algorithm applied to text surrounding the possible acronym. Evaluation shows both high recall and precision for a set of documents randomly s ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Abstract This paper introduces an automatic method for finding acronyms and their definitions in free text. The method is based on an inexact pattern matching algorithm applied to text surrounding the possible acronym. Evaluation shows both high recall and precision for a set of documents randomly selected from a larger set of full text documents. \Lambda
Using NLP or NLP Resources for Information Retrieval Tasks
- Natural Language Information Retrieval
, 1997
"... The imact of NLP on information retrieval tasks has largely been one of promise rather than substance. While there are exceptions to this as some of the chapters in the present volume demonstrate, for the most part NLP and information retrieval have only recently started to dovetail together. In thi ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
The imact of NLP on information retrieval tasks has largely been one of promise rather than substance. While there are exceptions to this as some of the chapters in the present volume demonstrate, for the most part NLP and information retrieval have only recently started to dovetail together. In this chapter we will present a pr'ecis of our experiments in information retrieval using NLP which have had mixed successover the last few years. We introduce the respective roles of NLP and IR and then we summarise our early experiments on using syntactic analysis to derive term dependencies and structured representations of term-term relationships. We then re-thought the role that NLP could have for IR tasks and decided to concentrate our efforts onto using NLP resources rather than NLP tools in information retrieval and our more recent experiments in this area in which we use WordNet are summarised. Finally we present our conclusions and the status of our work. 1 2. Introduction The develo...
Learning Implicit User Interest Hierarchy for Context in Personalization
- In Proc. of International Conference on Intelligent User Interface (IUI
, 2003
"... To provide a more robust context for personalization, we desire to extract a continuum of general (long-term) to specific (short-term) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical c ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
To provide a more robust context for personalization, we desire to extract a continuum of general (long-term) to specific (short-term) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical clustering (DHC) algorithm to group words (topics) into a hierarchy where more general interests are represented by a larger set of words. Each web page can then be assigned to nodes in the hierarchy for further processing in learning and predicting interests. This approach is analogous to building a subject taxonomy for a library catalog system and assigning books to the taxonomy. Our approach does not need user involvement and learns the UIH "implicitly." Furthermore, it allows the original objects, web pages, to be assigned to multiple topics (nodes in the hierarchy). In this paper, we focus on learning the UIH from a set of visited pages. We propose a few similarity functions and dynamic threshold-funding methods, and evaluate the resulting hierarchies according to their meaningfulhess and shape.

