Results 1 -
5 of
5
Quality-Biased Ranking of Web Documents
"... Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking method that promotes documents containing high ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Many existing retrieval approaches do not take into account the content quality of the retrieved documents, although link-based measures such as PageRank are commonly used as a form of document prior. In this paper, we present the quality-biased ranking method that promotes documents containing high-quality content, and penalizes low-quality documents. The quality of the document content can be determined by its readability, layout and ease-of-navigation, among other factors. Accordingly, instead of using a single estimate for document quality, we consider multiple contentbased features that are directly integrated into a state-ofthe-art retrieval method. These content-based features are easy to compute, store and retrieve, even for large web collections. We use several query sets and web collections to empirically evaluate the performance of our quality-biased retrieval method. In each case, our method consistently improves by a large margin the retrieval performance of textbased and link-based retrieval methods that do not take into account the quality of the document content.
Experiments in Blog and Enterprise Tracks with Terrier ABSTRACT
"... In TREC 2007, we participate in four tasks of the Blog and Enterprise tracks. We continue experiments using Terrier 1 [14], our modular and scalable Information Retrieval (IR) platform, and the Divergence From Randomness (DFR) framework. In particular, for the Blog track opinion finding task, we pro ..."
Abstract
- Add to MetaCart
In TREC 2007, we participate in four tasks of the Blog and Enterprise tracks. We continue experiments using Terrier 1 [14], our modular and scalable Information Retrieval (IR) platform, and the Divergence From Randomness (DFR) framework. In particular, for the Blog track opinion finding task, we propose a statistical term weighting approach to identify opinionated documents. An alternative approach based on an opinion identification tool is also utilised. Overall, a 15 % improvement over a non-opinionated baseline is observed in applying the statistical term weighting approach. In the Expert Search task of the Enterprise track, we investigate the use of proximity between query terms and candidate name occurrences in documents. 1.
Priors in Web Search ∗
"... Web search combines information obtained at query time with prior knowledge to form a posterior. This paper focuses on the prior, which we believe is interesting, given the poverty of the query stimulus (many of the web queries are no more than a word or two). We propose a learning framework based o ..."
Abstract
- Add to MetaCart
Web search combines information obtained at query time with prior knowledge to form a posterior. This paper focuses on the prior, which we believe is interesting, given the poverty of the query stimulus (many of the web queries are no more than a word or two). We propose a learning framework based on the Noisy Channel Model for combining prior evidence from multiple sources including both the authors ’ perspectives (e.g., PageRank- the principal eigenvector of the web graph) as well as the readers ’ perspectives (e.g., click logs and toolbar activity). The framework is general enough that it can be applied to both documents and queries, both of which have strong priors. We show that even features that appear to depend on the combination of queries and documents and are often used for learning a ranking function (such as relevance judgments or retrieval scores) can be included in the prior model using multiple mechanisms of aggregation (e.g., moments or entropy). More is more. The prior model improves with both more features and more aggregates. We conduct an empirical evaluation of the proposed framework, demonstrating its benefits over a diverse set of learning tasks including: (1) query difficulty estimation, (2) click types prediction and (3) document ranking. 1.
Tracks with Terrier
"... Feedback tracks. In all tracks, we continue the research and development of the Terrier platform 1 centred around extending state-of-the-art weighting models based on the Divergence From ..."
Abstract
- Add to MetaCart
Feedback tracks. In all tracks, we continue the research and development of the Terrier platform 1 centred around extending state-of-the-art weighting models based on the Divergence From
Content-Based Relevance Estimation on the Web Using Inter-Document Similarities
"... In adversarial and noisy search settings as the Web, the document-query surface level similarity can be a highly misleading relevance signal. Thus, devising content-based relevance estimation (ranking) approaches becomes highly challenging. We address this challenge using two methods that utilize in ..."
Abstract
- Add to MetaCart
In adversarial and noisy search settings as the Web, the document-query surface level similarity can be a highly misleading relevance signal. Thus, devising content-based relevance estimation (ranking) approaches becomes highly challenging. We address this challenge using two methods that utilize inter-document similarities in an initially retrieved list. The first removes documents from the list that exhibit high query similarity, but for which there is insufficient additional support for relevance that is based on interdocument similarities. The method is based on a probabilistic model that decouples document-query similarities from relevance estimation. The second method re-ranks the list by “rewarding ” documents that exhibit high similarity both to the query and to other documents in the list. Both methods incorporate, in addition, at the model level, queryindependent document quality estimates. Extensive empirical evaluation demonstrates the merits of our methods.

