Results 11 - 20
of
22
Information fusion with subject-based information gathering method for intelligent multi-agent models
- In The Seventh International Conference on Information Integration and Web-Based Applications and Services, Kuala Lumpur, Malaysia
, 2005
"... This paper addresses the problem of information fusion using a multi-agent information gathering system. We present a hierarchical subject-based query expansion method, followed by a cooperative fusion algorithm for unstructured documents. We evaluate the performance using the traditional methods of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper addresses the problem of information fusion using a multi-agent information gathering system. We present a hierarchical subject-based query expansion method, followed by a cooperative fusion algorithm for unstructured documents. We evaluate the performance using the traditional methods of precision and recall. The results show that the subject-based fusion method is promising and efficient. 1.
Query-URL Bipartite Based Approach to Personalized Query Recommendation
"... Query recommendation is considered an effective assistant in enhancing keyword based queries in search engines and Web search software. Conventional approach to query recommendation has been focused on query-term based analysis over the user access logs. In this paper, we argue that utilizing the co ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Query recommendation is considered an effective assistant in enhancing keyword based queries in search engines and Web search software. Conventional approach to query recommendation has been focused on query-term based analysis over the user access logs. In this paper, we argue that utilizing the connectivity of a query-URL bipartite graph to recommend relevant queries can significantly improve the accuracy and effectiveness of the conventional query-term based query recommendation systems. We refer to the Query-URL Bipartite based query reCommendation approach as QUBIC. The QUBIC approach has two unique characteristics. First, instead of operating on the original bipartite graph directly using biclique based approach or graph clustering, we extract an affinity graph of queries from the initial query-URL bipartite graph. The affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (distance) measure. By utilizing the query affinity graph, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering. We compare our proposed ranking algorithm with both naïve ranking that uses the query-URL similarity measure directly, and the single-linkage based ranking method. In addition, we make it possible for users to interactively participate in the query recommendation process, to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users ’ information needs, allowing the lists of related queries to be changed from user to user and query to query, thus personalizing the query recommendation on demand. The experimental results from two query collections demonstrate the effectiveness and feasibility of our approach. 1.
Learning Lexicon Models from Search Logs for Query Expansion
"... This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods. 1
Modeling actions of PubMed users with n-gram language models
"... they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed Ò, the public gateway to the MEDLINE Ò database of bibliographic records from the ..."
Abstract
- Add to MetaCart
they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed Ò, the public gateway to the MEDLINE Ò database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users ’ search experience.
Creation and Maintenance of Query Expansion
"... Abstract. In an information retrieval system, a thesaurus can be used for query expansion, i.e. adding words to queries in order to improve recall. We propose a semi-automatic and interactive approach for the creation and maintenance of domain-specific thesauri for query expansion. Domain-specific t ..."
Abstract
- Add to MetaCart
Abstract. In an information retrieval system, a thesaurus can be used for query expansion, i.e. adding words to queries in order to improve recall. We propose a semi-automatic and interactive approach for the creation and maintenance of domain-specific thesauri for query expansion. Domain-specific thesauri are especially required in highly technical domains where the use of general thesauri for query expansion introduces more noise than useful results. Our semi-automatic approach to thesaurus creation constitutes a good compromise between fully manual approaches, which produce high-quality thesauri but at a prohibitively high cost, and fully automatic approaches, which are cheap but produce thesauri of limited quality. This article describes our approach and the architecture of the system implementing it, named Cannelle. It exploits user query logs and natural language processing to identify valuable synonymy candidates, and allows editors to interactively explore and validate these candidates in the context of a domain-specific searchable knowledge base. We evaluated the system in the domain of online troubleshooting, where the proposed method yielded an improvement in the quality of the search results obtained. 1
Motorola Labs
"... When a user performs a web search, the first query entered will frequently not return the required information. Thus, one needs to review the initial set of links and then to modify the query or construct a new one. This incremental process is particularly frustrating and difficult to manage for a m ..."
Abstract
- Add to MetaCart
When a user performs a web search, the first query entered will frequently not return the required information. Thus, one needs to review the initial set of links and then to modify the query or construct a new one. This incremental process is particularly frustrating and difficult to manage for a mobile user due to the device limitations (e.g. keyboard, display). We present a query formulation architecture that employs the notion of context in order to automatically construct queries, where context refers to the article currently being viewed by the user. The proposed system uses semantic metadata extracted from the web page being consumed to automatically generate candidate queries. Novel methods are proposed to create and validate candidate queries. Further two variants of query expansion and a post-expansion validation technique are described. Finally, insights into the effectiveness of our system are provided based on evaluation tests of its individual components.
Session 19: Multi-Lingual IR Cross-Lingual Query Suggestion Using Query Logs of Different Languages
"... Query suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the input query. In this paper, we extend it to cross-lingual query suggestion (CLQS): for a query in one l ..."
Abstract
- Add to MetaCart
Query suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the input query. In this paper, we extend it to cross-lingual query suggestion (CLQS): for a query in one language, we suggest similar or relevant queries in other languages. This is very important to scenarios of cross-language information retrieval (CLIR) and cross-lingual keyword bidding for search engine advertisement. Instead of relying on existing query translation technologies for CLQS, we present an effective means to map the input query of one language to queries of the other language in the query log. Important monolingual and cross-lingual information such as word translation relations and word co-occurrence statistics, etc. are used to estimate the cross-lingual query similarity with a discriminative model. Benchmarks show that the resulting CLQS system significantly outperforms a baseline system based on dictionary-based query translation. Besides, the resulting CLQS is tested with French to English CLIR tasks on TREC collections. The results demonstrate higher effectiveness than the traditional query translation methods.
Regularized query classification using search click information
, 2008
"... Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web se ..."
Abstract
- Add to MetaCart
Hundreds of millions of users each day submit queries to the Web search engine. The user queries are typically very short which makes query understanding a challenging problem. In this paper, we propose a novel approach for query representation and classification. By submitting the query to a web search engine, the query can be represented as a set of terms found on the web pages returned by search engine. In this way, each query can be considered as a point in high-dimensional space and standard classification algorithms such as regression can be applied. However, traditional regression is too flexible in situations with large numbers of highly correlated predictor variables. It may suffer from the overfitting problem. By using search click information, the semantic relationship between queries can be incorporated into the learning system as a regularizer. Specifically, from all the functions which minimize the empirical loss on the labeled queries, we select the one which best preserves the semantic relationship between queries. We present experimental evidence suggesting that the regularized regression algorithm is able to use search click information effectively for query classification.
Learning a Robust Relevance Model for Search Using Kernel Methods
"... This paper points out that many search relevance models in information retrieval, such as the Vector Space Model, BM25 and Language Models for Information Retrieval, can be viewed as a similarity function between pairs of objects of different types, referred to as an S-function. An S-function is spe ..."
Abstract
- Add to MetaCart
This paper points out that many search relevance models in information retrieval, such as the Vector Space Model, BM25 and Language Models for Information Retrieval, can be viewed as a similarity function between pairs of objects of different types, referred to as an S-function. An S-function is specifically defined as the dot product between the images of two objects in a Hilbert space mapped from two different input spaces. One advantage of taking this view is that one can take a unified and principled approach to address the issues with regard to search relevance. The paper then proposes employing a kernel method to learn a robust relevance model as an S-function, which can effectively deal with the term mismatch problem, one of the biggest challenges in search. The kernel method exploits a positive semi-definite kernel referred to as an S-kernel. The paper shows that when using an S-kernel the model learned by the kernel method is guaranteed to be an S-function. The paper then gives more general principles for constructing S-kernels. A specific implementation of the kernel method is proposed using the Ranking SVM techniques and click-through data. The proposed approach is employed to learn a relevance model as an extension of BM25, referred to as Robust BM25. Experimental results on web search and enterprise search data show that Robust BM25 significantly outperforms baseline methods and can successfully tackle the term mismatch problem.
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Concept-Based Feature Generation and Selection for Information Retrieval
"... Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we ex ..."
Abstract
- Add to MetaCart
Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based feature generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating features, that we have in supervised learning. We present a new feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottomranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

