## Predicting bounce rates in sponsored search advertisements (2009)

### Cached

### Download Links

Venue: | In SIGKDD Conference on Knowledge Discovery and Data Mining (KDD |

Citations: | 18 - 3 self |

### BibTeX

@INPROCEEDINGS{Sculley09predictingbounce,

author = {D. Sculley and Robert Malkin and Sugato Basu and Roberto J. Bayardo and Google Inc},

title = {Predicting bounce rates in sponsored search advertisements},

booktitle = {In SIGKDD Conference on Knowledge Discovery and Data Mining (KDD},

year = {2009},

pages = {1325--1334}

}

### OpenURL

### Abstract

This paper explores an important and relatively unstudied quality measure of a sponsored search advertisement: bounce rate. The bounce rate of an ad can be informally defined as the fraction of users who click on the ad but almost immediately move on to other tasks. A high bounce rate can lead to poor advertiser return on investment, and suggests search engine users may be having a poor experience following the click. In this paper, we first provide quantitative analysis showing that bounce rate is an effective measure of user satisfaction. We then address the question, can we predict bounce rate by analyzing the features of the advertisement? An affirmative answer would allow advertisers and search engines to predict the effectiveness and quality of advertisements before they are shown. We propose solutions to this problem involving large-scale learning methods that leverage features drawn from ad creatives in addition

### Citations

2711 | Indexing by Latent Semantic Analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...er consideration, and 0 otherwise. The related terms were derived from the parsed terms using a transformation φ(·), using a proprietary process similar to term expansion via latent semantic analysis =-=[11]-=-. Cluster membership shows the strength of similarity of a given piece of content to a set of topical clusters M, as determined by a mapping function m(·, ·). These topical clusters M were found by a ... |

1241 | Combining labeled and unlabeled data with co-training
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ... 6, which group features by type rather than by source. This finding may enable the use of semi-supervised learning methods such as co-training that require informative but un-correlated feature sets =-=[4]-=- to exploit un-labeled data. 6. RELATED WORK To our knowledge, this work is the first detailed study of bounce rate for sponsored search advertising. It also provides the first concrete proposal for p... |

700 | Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
- Platt
- 1999
(Show Context)
Citation Context ...ce rate has a range of [0,1], this prediction problem fits naturally within a regression framework. Logistic regression [3] or support vector machine (SVM) regression [15] with probability estimation =-=[25]-=- for bounce rate prediction requires a mapping x(·, ·, ·) ↦→ R n from a query, creative, landing page triple to an n dimensional feature vector. The feature mapping explored in this paper, as detailed... |

399 |
Learning with Kernels: Support Vector
- Scholkopf, Smola
- 2001
(Show Context)
Citation Context ...ed in parallelizing stochastic gradient descent, see the recent talk by Delalleau and Bengio [12]. 4.2 ǫ-accurate SVM Regression SVM Regression is another state of the art method for regression tasks =-=[29]-=-. However, SVM solvers typically scale poorly with large training set sizes. We considered the use of parallelized SVMs, but preferred a faster method that yields an ǫ-accurate model: the Pegasos (Pri... |

389 |
Learning to Classify Text using Support Vector Machines
- Joachims
- 2002
(Show Context)
Citation Context ...Brate(q,c,p). Since the true bounce rate has a range of [0,1], this prediction problem fits naturally within a regression framework. Logistic regression [3] or support vector machine (SVM) regression =-=[15]-=- with probability estimation [25] for bounce rate prediction requires a mapping x(·, ·, ·) ↦→ R n from a query, creative, landing page triple to an n dimensional feature vector. The feature mapping ex... |

371 |
Stochastic Approximation Algorithms and Applications
- Kushner, Yin
- 1997
(Show Context)
Citation Context ...d via methods such as LBFGS [21]; however, for very large data sets, these methods do not scale well due to large matrix manipulations. We thus use stochastic gradient descent as a viable alternative =-=[19]-=-, noting that the non-differentiability induced by the L1 penalty term can be handled by methods similar to truncated gradient descent [20]. To achieve scalability, we use a parallelized learning algo... |

295 | Accurately interpreting clickthrough data as implicit feedback - Joachims, Granka, et al. - 2005 |

279 | Pegasos: Primal estimated sub-GrAdient solver for svm
- Shalev-Shwartz, Singer, et al.
- 2007
(Show Context)
Citation Context ...arge training set sizes. We considered the use of parallelized SVMs, but preferred a faster method that yields an ǫ-accurate model: the Pegasos (Primal Estimated subGrAdient SOlver for SVM) algorithm =-=[30]-=-. This iterative SVM solver is especially well-suited for learning from large datasets. It proposes a method that alternates between two steps: stochastic sub-gradient descent and projection of the hy... |

245 |
Improving web search ranking by incorporating user behavior information
- Agichtein, Brill, et al.
- 2006
(Show Context)
Citation Context ...ata set to eliminate outliers, and then the remaining values were normalized by the difference between the remaining maximum and minimum value. This results in each metric being rescaled to the range =-=[0,1]-=-. Popularity metrics for languages, keywords, and categories were computed similarly, but using the natural log of the raw frequencies. 3.2 Bounce Rate and Click Through Rate As described above, one t... |

148 |
Learning user interaction models for predicting Web search result preferences
- Agichtein, Brill, et al.
- 2006
(Show Context)
Citation Context ..., aggregating clicks to get overall statistics (e.g., clickthrough rate) gives reliable estimates that can be used to re-rank search results for queries and get quality improvements. Agichtein et al. =-=[2]-=- also developed a model for relating user behavior to relevance, proposing a simple linear mixture model for relating observed post-search user behavior to the relevance of a search result. Huffman et... |

100 | Predicting clicks: estimating the click-through rate for new ads
- Richardson, Dominowska, et al.
(Show Context)
Citation Context ...re types. As discussed in Section 6, this machine learning approach to bounce rate prediction is motivated by prior success in predicting CTR for sponsored search, as exemplified by Richardson et al. =-=[28]-=-. 2. BACKGROUND This section provides a brief background on sponsored search, gives a formal definition of bounce rate, and discusses methods for observing bounce rate non-intrusively. 2.1 Sponsored S... |

71 |
On the limited memory method for large scale optimization
- Liu, Nocedal
- 1989
(Show Context)
Citation Context ...lent to considering kBrate(x) examples of x with yx = 1 and k(1 − Brate(x)) examples of x with yx = 0, where k is a scaling constant. This optimization problem may be solved via methods such as LBFGS =-=[21]-=-; however, for very large data sets, these methods do not scale well due to large matrix manipulations. We thus use stochastic gradient descent as a viable alternative [19], noting that the non-differ... |

68 |
Good-turing frequency estimation without tears
- Gale, Sampson
- 1995
(Show Context)
Citation Context ...means that we assign weights of 1 to all items in the term vector. To avoid zero probabilities, we smoothed the probability distributions P and Q (i.e. the term vectors) using Good-Turing discounting =-=[13]-=- before computing the divergence. We also computed a normalized version of KLD whose range is [0,1], with the maximum KLD as the normalization factor. 4.4 Evaluation Given a feature mapping x(q,c,p) a... |

63 |
A Semantic Approach to Contextual Advertising. SIGIR`07
- Broder, Fontoura, et al.
(Show Context)
Citation Context ...advertisements. Researchers in computational advertising have suggested various methods to address this issue in order to design good matching functions between publisher pages and ads. Broder et al. =-=[7]-=- found that while training a model to predict the relevance of an ad to a publisher page, it is useful to augment “syntactic” features obtained by matching key-Parsed Related Clusters Categories Dist... |

59 | Sparse online learning via truncated gradient
- Langford, Li, et al.
(Show Context)
Citation Context ...thus use stochastic gradient descent as a viable alternative [19], noting that the non-differentiability induced by the L1 penalty term can be handled by methods similar to truncated gradient descent =-=[20]-=-. To achieve scalability, we use a parallelized learning algorithm where each machine handles a subset of the data. For a discussion of typical issues involved in parallelizing stochastic gradient des... |

39 |
Impedance coupling in contenttargeted advertising
- Ribeiro-Neto, Cristo, et al.
- 2005
(Show Context)
Citation Context ... adding text features from the landing pages of ads, and improved the ranking of content ads shown on a publisher page by using a support vector machine (SVM) based ranking model. Riberio-Neto et al. =-=[27]-=- proposed a Bayesian network-based approach to impedance coupling, for better matching of ads to publisher pages in contextual advertising. They also proposed different strategies for improving releva... |

37 | Evaluating search engines by modeling the relationship between relevance and clicks
- Carterette, Jones
- 2008
(Show Context)
Citation Context ...that predicts user satisfaction by incorporating features from the user’s first query into a relevance model. Their models were evaluated using relevance judgements of human raters. Carterette et al. =-=[8]-=- have used click information to evaluate the performance of search results – they proposed a model for predicting relevance of a search result to a user using clickthrough information. Such a model wo... |

30 | Search advertising using web relevance feedback
- Broder, Ciccolo, et al.
- 2008
(Show Context)
Citation Context ...ching functions that use hand-tuned combinations of syntactic or semantic features from the ad or page text. Textual relevance has also been used for other problems in sponsored search. Broder et al. =-=[6]-=- have used terms from the search results to enhance query terms for selecting advertisements. They demonstrated that the careful addition of terms from the web search results (extracting relevant phra... |

30 |
How well does result relevance predict session satisfaction
- Huffman, Hochster
- 2007
(Show Context)
Citation Context ... developed a model for relating user behavior to relevance, proposing a simple linear mixture model for relating observed post-search user behavior to the relevance of a search result. Huffman et al. =-=[14]-=- examined the connection between searchresult relevance in web search and users’ session-level satisfaction. They found a strong relationship between the relevance of the first query in a user session... |

23 | Contextual advertising by combining relevance with click feedback
- Chakrabarti, Agarwal, et al.
- 2008
(Show Context)
Citation Context ...pedance coupling, for better matching of ads to publisher pages in contextual advertising. They also proposed different strategies for improving relevance-based matching functions. Chakrabarti et al. =-=[9]-=- used clicks on contextual ads to learn a matching function. They trained a logistic model for predicting ad clicks based on relevance features between the publisher page and the ad. By training the f... |

21 | Multi-armed bandit problems with dependent arms
- Pandey, Chakrabarti, et al.
- 2007
(Show Context)
Citation Context ...e of a search result to a user using clickthrough information. Such a model would be useful for evaluating search results for which human relevance judgments have not been obtained yet. Pandey et al. =-=[23]-=- proposed a multi-arm bandit approach with dependent arms for more accurate clickthrough prediction, using historical observation along with other features such as textual similarity between ads. 7. C... |

19 | Online learning from click data for sponsored search
- Ciaramita, Murdock, et al.
- 2008
(Show Context)
Citation Context ...has not been modeled by researchers in the past, other aspects of user clickthrough behavior have been studied in the context of evaluating the quality of both ad and search results. Ciaramita et al. =-=[10]-=- estimated predicted clickthrough rates of search ads from textual relevance features. They trained a logistic model whose features were learned using clickthrough data from logs. Their work demonstra... |

19 | Predictive user click models based on click-through history
- Piwowarski, Zaragoza
- 2007
(Show Context)
Citation Context ...eatures were learned using clickthrough data from logs. Their work demonstrated that simple syntactic and semantic textual relevance features can be predictive of clickthrough rate. Piwowarski et al. =-=[24]-=- modeled user clickthrough on search results using specific click history (e.g., from users) or more general click history features (e.g., from user communities or global history). Agichtein et al. [1... |

18 | V.: To swing or not to swing: learning when (not) to advertise
- Broder, Ciaramita, et al.
- 2008
(Show Context)
Citation Context ...elevance between query and ad text can improve broad match while optimizing revenue. Other notable uses of relevance in computational advertising includes learning when not to show ads. Broder et al. =-=[5]-=- trained an SVM model using relevance and cohesiveness features to address the decision problem of whether or not to show an ad. 6.2 Modeling User Behavior Another aspect of our work is modeling user ... |

8 |
A noisy-channel approach to contextual advertising
- Murdock, Ciaramita, et al.
- 2007
(Show Context)
Citation Context ...onvex linear combination of syntactic and semantic features had an improvement over syntactic features alone, with respect to a “golden ranking” produced by human relevance judgements. Murdock et al. =-=[22]-=- showed the benefit of using machine translation techniques to match text features extracted from ads to those obtained from publisher pages in order to address the impedance problem. They obtained be... |

4 |
Parallel stochastic gradient descent. Talk presented at
- Delalleau, Bengio
- 2007
(Show Context)
Citation Context ...arning algorithm where each machine handles a subset of the data. For a discussion of typical issues involved in parallelizing stochastic gradient descent, see the recent talk by Delalleau and Bengio =-=[12]-=-. 4.2 ǫ-accurate SVM Regression SVM Regression is another state of the art method for regression tasks [29]. However, SVM solvers typically scale poorly with large training set sizes. We considered th... |

2 |
Bounce rate as sexiest web metric ever
- Kaushik
- 2007
(Show Context)
Citation Context ... (CTR) and conversion rate (CvR). Though less studied, another important metric of advertising effectiveness is bounce rate, which Avinash Kaushik of Google Analytics colorfully describes as follows: =-=[17, 18]-=-: Bounce rate for a page is the number of people who entered the site on a page and left right away. They came, they said yuk and they were on their way. Kaushik claims bounce rate is important for ad... |

2 |
Excellent analytics tip 11: Measure effectiveness of your web pages. Occam’s Razor (blog
- Kaushik
- 2007
(Show Context)
Citation Context ... (CTR) and conversion rate (CvR). Though less studied, another important metric of advertising effectiveness is bounce rate, which Avinash Kaushik of Google Analytics colorfully describes as follows: =-=[17, 18]-=-: Bounce rate for a page is the number of people who entered the site on a page and left right away. They came, they said yuk and they were on their way. Kaushik claims bounce rate is important for ad... |

1 |
Optimizing relevenace and revenue in ad search: A query substitution approach
- Radlinski, Broder, et al.
- 2008
(Show Context)
Citation Context ...retrieval of relevant ads. Their approach performed better when compared to a system that augments the query terms by phrases extracted from web users’ query rewrites in search logs. Radlinski et al. =-=[26]-=- also showed that relevance between query and ad text can improve broad match while optimizing revenue. Other notable uses of relevance in computational advertising includes learning when not to show ... |