Results 1 -
4 of
4
Boosting web retrieval through query operations
- Proceedings ECIR 2005
, 2005
"... We explore the use of phrase and proximity terms in the context of web retrieval, which is different from traditional ad-hoc retrieval both in document structure and in query characteristics. We show that for this type of task, the usage of both phrase and proximity terms is highly beneficial for e ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We explore the use of phrase and proximity terms in the context of web retrieval, which is different from traditional ad-hoc retrieval both in document structure and in query characteristics. We show that for this type of task, the usage of both phrase and proximity terms is highly beneficial for early precision as well as for overall retrieval effectiveness. We also analyze why phrase and proximity terms are far more effective for web retrieval than for ad-hoc retrieval.
Usefulness of hyperlink structure for query-biased topic distillation
- In Proceedings of the 27th Annual International SIGIR Conference on Research and Developement in Information Retrieval
, 2004
"... In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the effectiveness of this method in the context of an optimal Bayesian decision mechanism, which selects the most appropriate ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the effectiveness of this method in the context of an optimal Bayesian decision mechanism, which selects the most appropriate retrieval approaches on a perquery basis for two TREC tasks. The estimation of the hyperlink structure’s usefulness is stable when we use different weighting schemes, or when we employ sampling of documents to reduce the computational overhead. Next, we evaluate the effectiveness of the hyperlink structure’s usefulness in a realistic setting, by setting the thresholds of a decision mechanism automatically. Our results show that improvements over the baselines are obtained.
Using Clustering and Blade Clusters in the TeraByte task
- In Proceedings of the 13th text retrieval conference (TREC
, 2004
"... Web search engines exploit conjunctive queries and special ranking criteria which differ from the disjunctive queries typically used for ad-hoc retrieval. We wanted to asses the effectiveness of those techniques in the TeraByte task, in particular scoring criteria like: link popularity, proximity ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Web search engines exploit conjunctive queries and special ranking criteria which differ from the disjunctive queries typically used for ad-hoc retrieval. We wanted to asses the effectiveness of those techniques in the TeraByte task, in particular scoring criteria like: link popularity, proximity boosting, home page score, descriptions and anchor text. Since conjunctive queries sometimes produce low recall, we tested a new approach to query expansion, which extracts additional query terms from a clustering of the snippets from the first query. The technique proved effective, almost doubling the Mean Average Precision. However, the improvement was just enough to compensate for the drop that was introduced, contrary to our expectations, by the proximity boost.
IJDAR DOI 10.1007/s10032-009-0089-5 ORIGINAL PAPER
"... Abstract When searching for blogs on a specific topic, information seekers prefer blogs that place a central focus on that topic over blogs whose mention of the topic is diffuse or incidental. In order to present users with better blog feed search results, we developed a measure of topical consisten ..."
Abstract
- Add to MetaCart
Abstract When searching for blogs on a specific topic, information seekers prefer blogs that place a central focus on that topic over blogs whose mention of the topic is diffuse or incidental. In order to present users with better blog feed search results, we developed a measure of topical consistency that is able to capture whether or not a blog is topically focused. The measure, called the coherence score, is inspired by the genetics literature and captures the tightness of the clustering structure of a data set relative to a background collection. In a set of experiments on synthetic data, the coherence score is shown to provide a faithful reflection of topic clustering structure. The properties that make the coherence score more appropriate than lexical cohesion, a common measure of topical structure, are discussed. Retrieval experiments show that integrating the coherence score as a prior in a language modeling-based approach to blog feed search improves retrieval effectiveness. The coherence score must, however, be used judiciously in order to avoid boosting the ranking of irrelevant but topically focused blogs. To this end, we experiment with a series of weighting schemes that adjust the contribution of the coherence score according This paper is a revised and extended version of [19].

