Results 1 - 10
of
29
Query Expansion by Mining User Logs
- IEEE TRANSACTION ON KNOWLEDGE AND DATA ENGINEERING
, 2003
"... Queries to search engines on the Web are usually short. They do not provide sufficient evidence for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co- ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
Queries to search engines on the Web are usually short. They do not provide sufficient evidence for an effective selection of relevant documents. Previous research has proposed the utilization of query expansion to deal with this problem. However, expansion terms are usually determined on term co-occurrences within documents. In this study, we propose a new method for query expansion based on user interactions recorded in user logs. The central idea is to extract correlations between query terms and document terms by analyzing user logs. These correlations are then used to select high-quality expansion terms for new queries. Compared to
Order-Theoretical Ranking
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCES (JASIS
, 2000
"... Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and a document. The best known alternative ranking method, hierarchical clustering-based ranking (HCR), seems to be more robust than BMR with respect to this problem, but it is hampered by theoretic ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and a document. The best known alternative ranking method, hierarchical clustering-based ranking (HCR), seems to be more robust than BMR with respect to this problem, but it is hampered by theoretical and practical limitations. We present an approach to document ranking that explicitly addresses the word mismatch problem by exploiting interdocument similarity information in a novel way. Document ranking is seen as a querydocument transformation driven by a conceptual representation of the whole document collection, into which the query is merged. Our approach is based on the theory of concept (or Galois) lattices, which, we argue, provides a powerful, well-founded, and computationallytractable framework to model the space in which documents and query are represented and to compute such a transformation. We compared information retrieval using concept lattice-based ranking (CLR) to BMR and HCR. The results showed that HCR was outperformed by CLR as well as by BMR, and suggested that, of the two best methods, BMR achieved better performance than CLR on the whole document set while CLR compared more favorably when only the first retrieved documents were used for evaluation. We also evaluated the three methods' specific ability to rank documents that did not match the query, in which case the superiority of CLR over BMR and HCR (and that of HCR over BMR) was apparent.
Improving retrieval feedback with multiple term-ranking function combination
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2002
"... In this paper we consider methods for automatic query expansion from top retrieved documents (i.e., retrieval feedback) which make use of various functions for scoring expansion terms within Rocchio’s classical reweighting scheme. An analytical comparison shows that the retrieval performance of meth ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
In this paper we consider methods for automatic query expansion from top retrieved documents (i.e., retrieval feedback) which make use of various functions for scoring expansion terms within Rocchio’s classical reweighting scheme. An analytical comparison shows that the retrieval performance of methods based on distinct term-scoring functions is comparable on the whole query set but considerably differs on single queries, consistent with the fact that the ordered sets of expansion terms suggested for each query by the different functions are largely uncorrelated. Motivated by these findings, we argue that the results of multiple functions can be merged, by analogy with ensembling classifiers, and present a simple combination technique based on the rank values of the suggested terms. The combined retrieval feedback method is effective not only with respect to unexpanded queries but also to any individual method, with notable improvements on the system’s precision. Furthermore, the combined method is robust with respect to variation of experimental parameters and it is beneficial even when the same information needs are expressed with shorter queries.
Is Hillary Rodham Clinton the President? Disambiguating Names across Documents
- PROCEEDINGS OF THE ACL '99 WORKSHOP ON COREFERENCE AND ITS APPLICATIONS
, 1999
"... A number of research and software development groups have developed name identification technology, but few have addressed the issue of cross-document coreference, or identifying the same named entities across documents. In a collection of documents, where there are multiple discourse contexts, ther ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
A number of research and software development groups have developed name identification technology, but few have addressed the issue of cross-document coreference, or identifying the same named entities across documents. In a collection of documents, where there are multiple discourse contexts, there exists a manyto -many correspondence between names and entities, making it a challenge to automatically map them correctly. Recently, Bagga and Baldwin proposed a method for determining whether two names refer to the same entity by measuring the similarity between the document contexts in which they appear. Inspired by their approach, we have revisited our current cross- document coreference heuristics that make relatively simple decisions based on matching strings and entity types. We have devised an improved and promising algorithm, which we discuss in this paper.
B.K.: The talent system: Textract architecture and data model. Natural Language Engineering 10 (2004) 307–326
- In: Proceedings of the international Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA). (2004) 15–21 Schmid, H.: Probabilistic
, 2004
"... We present the architecture and data model for TEXTRACT, a document analysis framework for text analysis components. The framework and components have been deployed in research and industrial environments for text analysis and text mining tasks. 1 ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We present the architecture and data model for TEXTRACT, a document analysis framework for text analysis components. The framework and components have been deployed in research and industrial environments for text analysis and text mining tasks. 1
Information Navigation by Clustering and Summarizing Query Results
- In Proceedings of the 33rd Hawaii International Conference on System Sciences, Maui
, 2000
"... We have explored and evaluated a novel approach to information seeking grounded in the idea of summarizing query results through automated document clustering. The user starts with a natural language description of the needed information and navigates the information space through the interaction wi ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We have explored and evaluated a novel approach to information seeking grounded in the idea of summarizing query results through automated document clustering. The user starts with a natural language description of the needed information and navigates the information space through the interaction with the system. We implemented a prototype allowing searches of a significant portion of the entire World Wide Web. In a laboratory experiment, subjects searched the WWW for answers to a given set of questions. Our results indicate that our prototype improved search performance, presumably through better understanding of query results. In addition, we analyzed interaction patterns and the effects of such parameters as subject skills and task peculiarities.
Knowledge Navigation in Networked Digital Libraries
- the 11th European Workshop on Knowledge Acquisition, Modeling, and Management (EKAW’99
, 1999
"... Formulating precise and effective queries in document retrieval systems requires the users to predict which terms appear in documents relevant to their information needs. It is important that users do not retrieve a plethora of irrelevant documents due to underspecified queries or queries containing ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Formulating precise and effective queries in document retrieval systems requires the users to predict which terms appear in documents relevant to their information needs. It is important that users do not retrieve a plethora of irrelevant documents due to underspecified queries or queries containing ambiguous search terms. Due to these reasons, networked digital libraries with rapid growth in their volume of documents, document diversity, and terminological variations are becoming increasingly difficult to manage. In this paper we consider the concept of knowledge navigation for federated digital libraries and explain how it can provide the kind of intermediary expert prompting required to enable purposeful searching and effective discovery of documents.
ELICITATION BEHAVIOR DURING MEDIATED INFORMATION RETRIEVAL
, 1998
"... What elicitations or requests for information do search intermediaries make of users with information requests during an information retrieval (IR) interaction-including prior to and during an IR interaction--and for what purpose? These issues were investigated during a study of elicitations during ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
What elicitations or requests for information do search intermediaries make of users with information requests during an information retrieval (IR) interaction-including prior to and during an IR interaction--and for what purpose? These issues were investigated during a study of elicitations during 40 mediated IR interactions. A total of 1557 search intermediary elicitations were identified within 15 purpose categories. The elicitation purposes of search intermediaries included requests for information on search terms and strategies, database selection, search procedures, system's outputs and relevance of retrieved items, and users ' knowledge and previous information-seeking. The transition sequences from one type of search intermediary elicitation to another were also investigated. These findings are compared with results from a study of end-user questions [Nahl D. & Tenopir C. (1996) Afl'ective and cognitive searching behavior of novice and end-users of a full-text database. Journal o [ the American Society,/'or InJbrmation Science, 47(4), 276 286] and a study of user elicitations of search intermediaries [Wu, Mei Mei (1993) In/brmation interaction dialog: A study of patron elicitation in the inJormation retrieval interaction. Ph.D.
Building a Digital Library of Newspaper Clippings: The LAURIN Project
, 2000
"... The field of digital libraries has been attracting a lot of research efforts during the last years. Many interesting projects have been started, dealing with the various open issues arising in the field. However, no project has specifically taken into account the problem of building a digital librar ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
The field of digital libraries has been attracting a lot of research efforts during the last years. Many interesting projects have been started, dealing with the various open issues arising in the field. However, no project has specifically taken into account the problem of building a digital library of newspaper clippings. It is well known that a huge part of cultural knowledge is stored in the newspapers of yesterday. Since newspapers are not always easily accessible, special clipping archives were created in the 20th century. People interested in newspaper information benefit from these archives because the work of selecting, cutting and indexing articles is done by specialists. In order to maintain their important position in the information market, clipping archives should be able to integrate their special skills (such as professional knowledge and experience in gathering and treating newspaper information) into the new technologies of the information society. The EU-funded LAURI...
Advanced Conceptual Network Usage in Library Database Queries
, 1998
"... This paper describes the generic principles of Decomate-II's Concept Browser. It discusses the three main problems in using a thesaurus or conceptual network to help users formulate database queries: the network maintenance problem, the network navigation problem, and the network-to-database mapping ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes the generic principles of Decomate-II's Concept Browser. It discusses the three main problems in using a thesaurus or conceptual network to help users formulate database queries: the network maintenance problem, the network navigation problem, and the network-to-database mapping problem. Possible solutions to all these problems are proposed, based on previous experiences with concept network systems. Care is taken to keep the resulting system suitable for production use in an existing library environment based on Boolean keyword retrieval from a large collection with uncontrolled vocabulary. Keywords: Semantic network, thesaurus, conceptual network, knowledge navigation, lexicon, conceptual modeling, topic browsing, document retrieval, Decomate-II. URL: http://infolab.kub.nl/prj/decomate 1 Introduction Much work in the area of indexing and retrieval concentrates on constructing effective and efficient algorithms to find a set of `interesting' documents in a larg...

