## Clickthrough-Based Latent Semantic Models for Web Search (2011)

### Cached

### Download Links

Venue: | In Proceedings of SIGIR |

Citations: | 9 - 6 self |

### BibTeX

@INPROCEEDINGS{Gao11clickthrough-basedlatent,

author = {Jianfeng Gao and Kristina Toutanova and Wen-tau Yih},

title = {Clickthrough-Based Latent Semantic Models for Web Search},

booktitle = {In Proceedings of SIGIR},

year = {2011}

}

### OpenURL

### Abstract

This paper presents two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR). Assuming that a query is parallel to the titles of the documents clicked on for that query, large amounts of query-title pairs are constructed from clickthrough data; two latent semantic models are learned from this data. One is a bilingual topic model within the language modeling framework. It ranks documents for a query by the likelihood of the query being a semantics-based translation of the documents. The semantic representation is language independent and learned from query-title pairs, with the assumption that a query and its paired titles share the same distribution over semantic topics. The other is a discriminative projection model within the vector space modeling framework. Unlike Latent Semantic Analysis and its variants, the projection matrix in our model, which is used to map from term vectors into sematic space, is learned discriminatively such that the distance between a query and its paired title, both represented as vectors in the projected semantic space, is smaller than that between the query and the titles of other documents which have no clicks for that query. These models are evaluated on the Web search task using a real world data set. Results show that they significantly outperform their corresponding baseline models, which are state-of-the-art.

### Citations

8842 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...3.1, and in Eq. (3) are treated as parameters rather than hidden variables, as in the Bayesian inference methods. 3.1 MAP Estimation z q β q Φ q |q| z w β d Φ d |d| D We use the standard EM algorithm =-=[10]-=- to estimate the parameters ( ) of BLTM by maximizing the joint log-likelihood of the parallel corpus and the parameters, as shown in Eq. (3). The derivation of the updates is similar to that describe... |

2968 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ... we strive to effectively learn model parameters on clickthrough data for the application of Web search. 2.3 Linear Projection Models One of the most well-known linear projection models for IR is LSA =-=[9]-=-. LSA models the whole document collection using a document-term matrix C, where n is the number of documents and d is the number of word types, and performs singular value decomposition (SVD) on C. T... |

2608 | Latent dirichlet allocation
- Blei, Ng, et al.
- 2003
(Show Context)
Citation Context ...e a query term is generated from a mixture of factors. While in translation models the factors are simply words in a document, in PLSA the factors are hidden topics. Latent Dirichlet Allocation (LDA) =-=[4]-=- generalizes PLSA to a proper generative model and places Dirichlet priors over the parameters and . As a result, in LDA, instead of a single most likely topic vector for a document, a posterior distr... |

1259 | The mathematcs of statistical machine translation: parameter estimation
- Brown, Pietra, et al.
- 1993
(Show Context)
Citation Context ... (14) to (16). The only difference is that in Eq. (16) is replaced by the word translation model defined as ( | ) ∑ ( | ) ( | ) where ( | ) is the word translation probability assigned by IBM-Model-1 =-=[5]-=-, trained on query-title pairs using EM. The results in Table 1 suggest several conclusions. First, using PLSA alone as a document model hurts the ranking performance (Row 2 vs. Row 1). But a linear c... |

941 | A language modeling approach to information retrieval - Ponte, Croft - 1998 |

863 | Probabilistic latent semantic indexing
- Hofmann
- 1999
(Show Context)
Citation Context ...rase models and factored models have also been investigated [14, 23, 26]. 2.2 Generative Topic Models One of the first topic models widely used for IR is Probabilistic Latent Semantic Analysis (PLSA) =-=[19]-=-. Although Hofmann applied PLSA to IR in the VSM framework in the original paper, PLSA is in nature a generative model, and can be more straightforwardly incorporated into the language modeling framew... |

753 | A study of smoothing methods for language models applied to ad hoc to information retrieval
- Zhai, Lafferty
- 2001
(Show Context)
Citation Context ...nd model and document model, respectively. and are tuning parameters with their values between 0 and 1. Notice that letting reduces the model to a unigram language model with Jelinek-Mercer smoothing =-=[34]-=-, which is used as baseline in our experiments. Letting , the document model depends solely on BLTM. Also notice that BLTM in Eq. (17) differs from a topic model of Eq. (2). Although in both models th... |

706 | Statistical phrase-based translation
- Koehn, Och, et al.
- 2003
(Show Context)
Citation Context ... real query-document pairs for translation model training [14]. Given enough training data, more sophisticated translation models such as phrase models and factored models have also been investigated =-=[14, 23, 26]-=-. 2.2 Generative Topic Models One of the first topic models widely used for IR is Probabilistic Latent Semantic Analysis (PLSA) [19]. Although Hofmann applied PLSA to IR in the VSM framework in the or... |

382 | Learning to rank using gradient descent
- Burges, Sheked, et al.
- 2005
(Show Context)
Citation Context ... our model learns a projection matrix, which maps the term-vector of a document onto a lower-dimensional semantic space, using a supervised learning method. Inspired by the learning-to-rank framework =-=[6]-=-, the projection matrix is learned discriminatively in such a way that the distance between a query and its paired title, both represented as vectors in a projected semantic space, is smaller than tha... |

316 | Ir evaluation methods for retrieving highly relevant documents
- Jarvelin, Kekalainen
- 2000
(Show Context)
Citation Context ...n the other half, and the global retrieval results are combined from those of the two sets. The performance of all the ranking models was measured by mean Normalized Discounted Cumulative Gain (NDCG) =-=[21]-=-. We report NDCG scores at truncation levels 1, 3, and 10. We also performed a significance test using the paired t-test. Differences are considered statistically significant when the p-value is less ... |

283 | Information retrieval as statistical translation
- Berger, Lafferty
- 1999
(Show Context)
Citation Context ...ower-dimensional semantic space, can still have a high similarity even if they do not share any term. An alternative strategy to cope with the problem is the approach based on statistical translation =-=[2]-=-: A query term can be a translation of any word in a document which may be different from, but semantically related to the query term; and the relevance of a document given a query is assumed proporti... |

156 |
LDA-based document models for ad-hoc retrieval
- Wei, Croft
(Show Context)
Citation Context ... conjugate Dirichlet priorwhich is the same for all documents. So, in theory LDA overcomes some problems of PLSA such as overfitting and the issues regarding generating queries from unseen documents =-=[4, 32]-=-. However, whether the theoretical superiority of LDA can be translated into significant empirical improvement over PLSA on realistic applications, such as Web search, remains to be demonstrated. The ... |

95 | A correlated topic model of science - Blei, Lafferty |

93 | Topics in semantic representation - Griffiths, Steyvers, et al. |

74 | Posterior regularization for structured latent variable models
- Ganchev, Graca, et al.
(Show Context)
Citation Context ...27], we extend BTLM by constraining the paired query and title to have similar fractions of tokens assigned to each topic, and the constraint is enforced on expectation using posterior regularization =-=[13]-=-. BTLM with posterior regularization (BTLM-PR) is a variant of CPLSA [27] with two important modifications. First, while BTLM-PR assumes a pair of query and title to share the same topic distribution ... |

62 | Automatic cross-linguistic information retrieval using latent semantic indexing
- Dumais, Landauer, et al.
- 1996
(Show Context)
Citation Context ...y at the sematic level rather than at the word level. The second line of previous work that lays the foundation of this study is the research on cross-lingual and multi-lingual latent semantic models =-=[11, 25, 27]-=-. In these earlier works, various extensions of Latent Semantic Analysis (LSA) or topic models are developed for applications such as cross-lingual IR [11] and retrieval of parallel Web pages [27]. In... |

61 |
Statistical Machine Translation: From Single-Word Models to Alignment Templates
- Och
- 2003
(Show Context)
Citation Context ... real query-document pairs for translation model training [14]. Given enough training data, more sophisticated translation models such as phrase models and factored models have also been investigated =-=[14, 23, 26]-=-. 2.2 Generative Topic Models One of the first topic models widely used for IR is Probabilistic Latent Semantic Analysis (PLSA) [19]. Although Hofmann applied PLSA to IR in the VSM framework in the or... |

60 | On smoothing and inference for topic models
- Asuncion, Welling, et al.
- 2009
(Show Context)
Citation Context ... in [32] without directly comparing it to PLSA. [17] clarifies the relationship between LDA and PLSA in the context of IR, and concludes that PLSA is a maximum a posteriori (MAP) estimated LDA model. =-=[1]-=- shows that MAP inference performs comparably to the best Bayesian inference methods for LDA. Therefore, in our experiments all the topic models are implemented as PLSA, or equivalently, LDA with MAP ... |

51 | Polylingual topic models
- Mimno, Wallach, et al.
- 2009
(Show Context)
Citation Context ...y at the sematic level rather than at the word level. The second line of previous work that lays the foundation of this study is the research on cross-lingual and multi-lingual latent semantic models =-=[11, 25, 27]-=-. In these earlier works, various extensions of Latent Semantic Analysis (LSA) or topic models are developed for applications such as cross-lingual IR [11] and retrieval of parallel Web pages [27]. In... |

42 | Title language model for information retrieval
- Jin, Hauptmann, et al.
- 2002
(Show Context)
Citation Context ... large amount of query-document pairs, in each of which the document is judged as relevant to the query. Due to the lack of such training data, [2] resorts to some synthetic query-document pairs, and =-=[22]-=- simply uses the title-document pairs as substitutes for training data. Since recently, with the growing availability of search logs, it is possible to mine implicit relevance judgments from clickthro... |

30 | Smoothing clickthrough data for web search ranking
- Gao, Yuan, et al.
- 2009
(Show Context)
Citation Context ...tistically significant when the p-value is less than 0.05. In our experiments, the query-title pairs, used for model training, are extracted from one year query log files using a procedure similar to =-=[16]-=-. First of all, a set of query sessions were extracted from the raw log files. A query session consists of a user-issued query and a ranked list of documents, each of which may or may not be clicked b... |

23 | Exploring Web Scale Language Models for Search Query
- Huang
(Show Context)
Citation Context ...nts by literally matching terms in documents with those in a search query. However, lexical matching methods can be inaccurate due to the language discrepancy between Web documents and search queries =-=[20, 31]-=- i.e., a concept is often expressed using different vocabularies and language styles in documents and queries. In the last two decades, different latent semantic models have been proposed to address t... |

22 | Model adaptation via model interpolation and boosting for web search ranking - Gao, Wu, et al. - 2009 |

14 | J.Y.: Clickthrough-based translation models for web search: from word models to phrase models
- Gao, He, et al.
- 2010
(Show Context)
Citation Context ...dent than by mapping them at the word level. Our work is based on two lines of previous research. The first is a set of clickthrough-based translation models for Web search presented and evaluated in =-=[14]-=-, which is a significant extension of the original approach [2], motivated by the increasingly large amount of clickthrough data. Following [14], in this study we consider documents and queries as two... |

13 |
Translingual document representations from discriminative projections
- Platt, Toutanova, et al.
- 2010
(Show Context)
Citation Context ...y at the sematic level rather than at the word level. The second line of previous work that lays the foundation of this study is the research on cross-lingual and multi-lingual latent semantic models =-=[11, 25, 27]-=-. In these earlier works, various extensions of Latent Semantic Analysis (LSA) or topic models are developed for applications such as cross-lingual IR [11] and retrieval of parallel Web pages [27]. In... |

7 |
Adaptive Bayesian latent semantic analysis
- Chien, Wu
(Show Context)
Citation Context ...stimate the parameters ( ) of BLTM by maximizing the joint log-likelihood of the parallel corpus and the parameters, as shown in Eq. (3). The derivation of the updates is similar to that described in =-=[7, 8]-=-. In the E-step, the posterior probabilities for each term in queryand each term in its paired title are computed for the latent variables according to: ( | ) ( | ) ( | ) ( | ) ∑ ( | ) ( | ) ( | ) ( ... |

7 | A machine learning approach for improved bm25 retrieval - Svore, Burges - 2009 |

6 | Multi-Style Language Model for Web Scale Information Retrieval
- Wang, Li, et al.
- 2010
(Show Context)
Citation Context ...nts by literally matching terms in documents with those in a search query. However, lexical matching methods can be inaccurate due to the language discrepancy between Web documents and search queries =-=[20, 31]-=- i.e., a concept is often expressed using different vocabularies and language styles in documents and queries. In the last two decades, different latent semantic models have been proposed to address t... |

5 | Bayesian latent semantic analysis of multimedia databases
- Freitas, Barnard
- 2001
(Show Context)
Citation Context ...stimate the parameters ( ) of BLTM by maximizing the joint log-likelihood of the parallel corpus and the parameters, as shown in Eq. (3). The derivation of the updates is similar to that described in =-=[7, 8]-=-. In the E-step, the posterior probabilities for each term in queryand each term in its paired title are computed for the latent variables according to: ( | ) ( | ) ( | ) ( | ) ∑ ( | ) ( | ) ( | ) ( ... |

5 | Learning discriminative projections for text similarity measures
- Yih, Toutanova, et al.
- 2011
(Show Context)
Citation Context ... proposed learning framework that learns discriminatively the projection matrix from pairs of related and unrelated documents. We briefly introduce the model below and interested readers can refer to =-=[33]-=- for more detail. S2Net treats the raw term vector as the input layer and the mapped concept vector as the output layer. The value of each node in the output layer is a linear sum of all the input nod... |

1 |
On an equivalence between PLSA and LDA
- Girolami, Kaban
(Show Context)
Citation Context ...pirical improvement over PLSA on realistic applications, such as Web search, remains to be demonstrated. The effectiveness of LDA for IR is demonstrated in [32] without directly comparing it to PLSA. =-=[17]-=- clarifies the relationship between LDA and PLSA in the context of IR, and concludes that PLSA is a maximum a posteriori (MAP) estimated LDA model. [1] shows that MAP inference performs comparably to ... |