Results 1 - 10
of
14
Ready to Buy or Just Browsing? Detecting Web Searcher Goals from Interaction Data
"... An improved understanding of the relationship between search intent, result quality, and searcher behavior is crucial for improving the effectiveness of web search. While recent progress in user behavior mining has been largely focused on aggregate server-side click logs, we present a new class of s ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
An improved understanding of the relationship between search intent, result quality, and searcher behavior is crucial for improving the effectiveness of web search. While recent progress in user behavior mining has been largely focused on aggregate server-side click logs, we present a new class of search behavior models that also exploit fine-grained user interactions with the search results. We show that mining these interactions, such as mouse movements and scrolling, can enable more effective detection of the user’s search goals. Potential applications include automatic search evaluation, improving search ranking, result presentation, and search advertising. We describe extensive experimental evaluation over both controlled user studies, and logs of interaction data collected from hundreds of real users. The results show that our method is more effective than the current state-of-the-art techniques, both for detection of searcher goals, and for an important practical application of predicting ad clicks for a given search session.
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce
"... Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typi ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Google’s computing infrastructure is based on commodity hardware. In this paper, we describe PLANET: a scalable distributed framework for learning tree models over large datasets. PLA-NET defines tree learning as a series of distributed computations, and implements each one using the MapReduce model of distributed computation. We show how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models. We discuss the benefits and challenges of using a MapReduce compute cluster for tree learning, and demonstrate the scalability of this approach by applying it to a real world learning task from the domain of computational advertising. 1.
Improving Ad Relevance in Sponsored Search
"... We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model to learn from past user clicks on advertisements. We present a novel approach using translation models to learn user click ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model to learn from past user clicks on advertisements. We present a novel approach using translation models to learn user click propensity from sparse click logs. Our relevance predictions are then applied to multiple sponsored search applications in both offline editorial evaluations and live online user tests. The predicted relevance score is used to improve the quality of the search page in three areas: filtering low quality ads, more accurate ranking for ads, and optimized page placement of ads to reduce prominent placement of low relevance ads. We show significant gains across all three tasks.
Incorporating post-click behaviors into a click model
- In Proceedings of SIGIR
, 2010
"... Much work has attempted to model a user’s click-through behavior by mining the click logs. The task is not trivial due to the well-known position bias problem. Some breakthroughs have been made: two newly proposed click models, DBN and CCM, addressed this problem and improved document relevance esti ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Much work has attempted to model a user’s click-through behavior by mining the click logs. The task is not trivial due to the well-known position bias problem. Some breakthroughs have been made: two newly proposed click models, DBN and CCM, addressed this problem and improved document relevance estimation. However, to further improve the estimation, we need a model that can capture more sophisticated user behaviors. In particular, after clicking a search result, a user’s behavior (such as the dwell time on the clicked document, and whether there are further clicks on the clicked document) can be highly indicative of the relevance of the document. Unfortunately, such measures have not been incorporated in previous click models. In this paper, we introduce a novel click model, called the post-click click model (PCC), which provides an unbiased estimation of document relevance through leveraging both click behaviors on the search page and post-click behaviors beyond the search page. The PCC model is based on the Bayesian approach, and because of its incremental nature, it is highly scalable to large scale and constantly growing log data. Extensive experimental results illustrate that the proposed method significantly outperforms the state of the art methods merely relying on click logs. 1.
Mean field equilibria of dynamic auctions with learning. SSRN eLibrary 1799085
, 2011
"... We study learning in a dynamic setting where identical copies of a good are sold over time through a sequence of second price auctions. Each agent in the market has an unknown independent private valuation which determines the distribution of the reward she obtains from the good; for example, in spo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We study learning in a dynamic setting where identical copies of a good are sold over time through a sequence of second price auctions. Each agent in the market has an unknown independent private valuation which determines the distribution of the reward she obtains from the good; for example, in sponsored search settings, advertisers may initially be unsure of the value of a click. Though the induced dynamic game is complex, we simplify analysis of the market using an approximation methodology known as mean field equilibrium (MFE). The methodology assumes that agents optimize only with respect to long run average estimates of the distribution of other players ’ bids. We show a remarkable fact: in a mean field equilibrium, the agent has an optimal strategy where she bids truthfully according to a conjoint valuation. The conjoint valuation is the sum of her current expected valuation, together with an overbid amount that is exactly the expected marginal benefit to one additional observation about her true private valuation. Under mild conditions on the model, we show that an MFE exists, and that it is a good approximation to a rational agent’s behavior as the number of agents increases. We conclude by discussing the implications of the auction format and design on the auctioneer’s revenue. In particular, we establish a dynamic version of the revenue equivalence theorem, and discuss optimal selection of reserve prices in dynamic auctions.
Ad Quality On TV: Predicting Television Audience Retention
"... This paper explores the impact of television advertisements on audience retention using data collected from television set-top boxes (STBs) 1. In particular, we discuss how the accuracy of the retention score, a measure of ad quality, is improved by using the recent “click history ” of the STBs tune ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper explores the impact of television advertisements on audience retention using data collected from television set-top boxes (STBs) 1. In particular, we discuss how the accuracy of the retention score, a measure of ad quality, is improved by using the recent “click history ” of the STBs tuned to the ad. These retention scores are related to – and are a natural extension of – other measures of ad quality that have been used in online advertising since at least 2005 [2]. Like their online counterparts, TV retention scores could be used to determine if an ad should be eligible to enter the inventory auction and, if it is, how highly the ad should be ranked [1]. A retention score (RS) could also be used by the auction system for pricing, or by the advertiser to compare different creatives for the same product. 1.
4. Summary and Bibliography IR Overview
"... All slides and tutorial materials © 2009 by the authors ..."
• Audience
"... – Half have applied machine learning (perhaps as black box) – Half are IR researchers with casual machine learning exposure – Some (10-15%) have developed new ML techniques and are looking for other target applications. • Goals – Provide overview of core Machine Learning (ML) methods – Show how ML m ..."
Abstract
- Add to MetaCart
– Half have applied machine learning (perhaps as black box) – Half are IR researchers with casual machine learning exposure – Some (10-15%) have developed new ML techniques and are looking for other target applications. • Goals – Provide overview of core Machine Learning (ML) methods – Show how ML methods apply to important IR problems – Survey recent high-impact ML contributions to IR – Highlight IR areas with promising opportunities for ML
Is Pay-Per-Click Efficient? An Empirical Analysis of Click Values
"... Current sponsored search auction adopts per-click bidding. It implicitly assumes that an advertiser treats all clicks to be equally valuable. This is not always true in real world situations. Clicks which lead to conversions are definitely more valuable than those fraudulent clicks. In this work, we ..."
Abstract
- Add to MetaCart
Current sponsored search auction adopts per-click bidding. It implicitly assumes that an advertiser treats all clicks to be equally valuable. This is not always true in real world situations. Clicks which lead to conversions are definitely more valuable than those fraudulent clicks. In this work, we use post-ad-click behavior to measure a click’s value and empirically show that for an advertiser, values of different clicks are highly variant. Thus for many clicks, the advertiser’s single bid does not reflect his true valuations. This indicates that the sponsored search system under PPC mechanism is not efficient, or does not always give a slot to the advertiser who needs it most.
Dustin Hillard
"... Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and difficult to procure for ad creatives. A small set of manually derived constra ..."
Abstract
- Add to MetaCart
Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and difficult to procure for ad creatives. A small set of manually derived constraints on feature expectations over unlabeled data can be used to partially and probabilistically label large amounts of data. Utilizing recent work in constraint-based semi-supervised learning, this paper injects light weight supervision specified as these “constraints ” into a semi-Markov conditional random field model of entity extraction in ad creatives. Relying solely on the constraints, the model is trained on a set of unlabeled ads using an online learning algorithm. We demonstrate significant accuracy improvements on a manually labeled test set as compared to a baseline dictionary approach. We also achieve accuracy that approaches a fully supervised classifier. 1

