Results 1 - 10
of
31
Learning Multiple Tasks with Kernel Methods
- Journal of Machine Learning Research
, 2005
"... Editor: John Shawe-Taylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard single-task kernel methods, such as support vector machines and regularization networks, are extended to the case of multi-task learning. Our analysis sh ..."
Abstract
-
Cited by 96 (5 self)
- Add to MetaCart
Editor: John Shawe-Taylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard single-task kernel methods, such as support vector machines and regularization networks, are extended to the case of multi-task learning. Our analysis shows that the problem of estimating many task functions with regularization can be cast as a single task learning problem if a family of multi-task kernel functions we define is used. These kernels model relations among the tasks and are derived from a novel form of regularizers. Specific kernels that can be used for multi-task learning are provided and experimentally tested on two real data sets. In agreement with past empirical work on multi-task learning, the experiments show that learning multiple related tasks simultaneously using the proposed approach can significantly outperform standard single-task learning particularly when there are many related tasks but few data per task.
Regularized multi-task learning
, 2004
"... This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize cl ..."
Abstract
-
Cited by 92 (1 self)
- Add to MetaCart
This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
Bayesian Statistics and Marketing
- Marketing Science
, 2005
"... Statistical research in marketing is heavily influenced by the availability of different types of data. The last ten years have seen an explosion in the amount and variety of data available to market researchers. Demand data from scanning equipment has now become routinely available in the packaged ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Statistical research in marketing is heavily influenced by the availability of different types of data. The last ten years have seen an explosion in the amount and variety of data available to market researchers. Demand data from scanning equipment has now become routinely available in the packaged goods industries. Data from e-commerce and direct marketing is growing at an exponential rate and provide coverage to a wide assortment of different products. Web-based technology has dramatically lowered the cost of survey research. Web-browsing data is an important new source of information about consumer tastes and preferences which is becoming available for a large fraction of the total consumer population. In this vignette, we explore some of the implications of this data explosion for the development of statistical methodology in marketing with primary emphasis on the explosion in demand data. Scanning equipment has provided the market researcher with a national panel of stores in addition to panels of households, altering the focus of marketing research. This data has stimulated a large literature on applied demand and discrete choice modeling. Demand models at the store level typically take the form of multivariate regression models in which demand for a vector of products is related to marketing variables such as prices, displays and various forms of advertising. At the household level, demand is discrete and a wide variety of multinomial logit and probit models have been applied to the data. Early experience with scanner data revealed that households have very different patterns of buying behavior that cannot be explained just by differences in the marketing environment. Some households, for example, exhibit strong brand loyalties while other households readily switch brands when prices are lowered. Even at the store level, large differences have been detected in price and local advertising sensitivity. Initial observations of store and consumer heterogeneity created considerable interest in models of observed and unobservable heterogeneity, primarily of the random effects form. The development and application of random effect models in marketing has been dictated in large degree by the available inference technology. The first paper in this area by Kamarkura and Russell (1989) used a finite mixture model of heterogeneity in a logit framework. Kamarkura and Russell postulate a discrete
Competitive price discrimination strategies in a vertical channel using aggregate retail data
- Management Science
, 2003
"... We explore opportunities for targeted pricing for a retailer that only tracks weekly storelevel aggregate sales and marketing-mix information. We show that it is possible, using these data, to recover essential features of the underlying distribution of consumer willingness to pay. Knowledge of this ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We explore opportunities for targeted pricing for a retailer that only tracks weekly storelevel aggregate sales and marketing-mix information. We show that it is possible, using these data, to recover essential features of the underlying distribution of consumer willingness to pay. Knowledge of this distribution may enable the retailer to generate additional profits from targeting by using choice information at the checkout counter. In estimating demand we incorporate a supply-side model of the distribution channel that captures important features of competitive price-setting behavior of firms. This latter aspect helps us control for the potential endogeneity generated by unmeasured product characteristics in aggregate data. The channel controls for competitive aspects both between manufacturers and between manufacturers and a retailer. Despite this competition, we find that targeted pricing need not generate the prisoner’s dilemma in our data. This contrasts with the findings of theoretical models due to the flexibility of the empirical model of demand. The demand system we estimate captures richer forms of product differentiation, both vertical and horizontal, as well as a more flexible distribution of consumer heterogeneity.
A review of recent research in metareasoning and metalearning
- AI Magazine
, 2007
"... Recent years have seen a resurgence of interest in the use of metacognition in intelligent systems. This essay is part of a small section meant to give interested researchers an overview and sampling of the kinds of work currently being pursued in this broad area. The current essay offers a review o ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Recent years have seen a resurgence of interest in the use of metacognition in intelligent systems. This essay is part of a small section meant to give interested researchers an overview and sampling of the kinds of work currently being pursued in this broad area. The current essay offers a review of recent research in two main topic areas: the monitoring and control of reasoning (metareasoning) and the monitoring and control of learning (metalearning). What is metacognition in computation? Rosie (the robot maid from the TV show The Jetsons) spends her days cooking, cleaning, ironing, and attending to the usual household tasks of late 21 st century life. Because of a bug in one of her memory chips, however, she almost always forgets to buy dog food when she goes out. She has an adequate recovery plan for this: she simply feeds Astro some of the Jetson’s dinner. But 21 st century human food is expensive, so this strategy is wasteful. Realizing this, and recognizing that she has forgotten several times, Rosie adopts a special strategy to help her remember: she sticks the spare dog collar in her
A Bayesian model to forecast new product performance in domestic and international markets. Marketing Science 115–136
, 1999
"... This paper attempts to shed light on the following research questions: When a firm introduces a new product (or service) how can it effectively use the different information sources available to generate reliable new product performance forecasts? How can the firm account for varying information ava ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper attempts to shed light on the following research questions: When a firm introduces a new product (or service) how can it effectively use the different information sources available to generate reliable new product performance forecasts? How can the firm account for varying information availability at different stages of the new product launch and generate forecasts at each stage? We address these questions in the context of the sequential launches of motion pictures in international markets. Players in the motion picture industry require forecasts at different stages of the movie launch process to aid decisionmaking, and the information sets available to generate such forecasts vary at different stages. Despite the importance of such forecasts, the industry struggles to understand and predict
A comparison of hierarchical Bayes and maximum simulated likelihood for mixed logit
, 2001
"... ..."
An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity
- Journal of Marketing Research
, 2002
"... Currently, there is an important debate about the relative merits of models with discrete and continuous representations of consumer heterogeneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; hereafter AAC) compared metric conjoint analysis models with discrete and continuous represent ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Currently, there is an important debate about the relative merits of models with discrete and continuous representations of consumer heterogeneity. In a recent JMR study, Andrews, Ansari, and Currim (2002; hereafter AAC) compared metric conjoint analysis models with discrete and continuous representations of heterogeneity and found no differences between the two models with respect to parameter recovery and prediction of ratings for holdout profiles. Models with continuous representations of heterogeneity fit the data better than models with discrete representations of heterogeneity. The goal of the current study is to compare the relative performance of logit choice models with discrete versus continuous representations of heterogeneity in terms of the accuracy of household-level parameters, fit, and forecasting accuracy. To accomplish this goal, the authors conduct an extensive simulation experiment with logit models in a scanner data context, using an experimental design based on AAC and other recent simulation studies. One of the main findings is that models with continuous and discrete representations of heterogeneity recover household-level parameter estimates and predict holdout choices about equally well except when the number of purchases per household is small, in which case the models with continuous representations perform very poorly. As in the AAC study, models with continuous representations of heterogeneity fit the data better.
NEARLY OPTIMAL PRICING FOR MULTIPRODUCT FIRMS ∗
, 2008
"... In principle, a multiproduct firm can set separate prices for all possible bundled combinations of its products (i.e., ”mixed bundling”). However, this is impractical for firms with more than a few products, because the number of prices increases exponentially with the number of products. In this st ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In principle, a multiproduct firm can set separate prices for all possible bundled combinations of its products (i.e., ”mixed bundling”). However, this is impractical for firms with more than a few products, because the number of prices increases exponentially with the number of products. In this study we show that simple pricing strategies are often nearly optimal—i.e., with surprisingly few prices a firm can obtain 99 % of the profit that would be earned by mixed bundling. Specifically, we show that bundle-size pricing—setting prices that depend only on the size of bundle purchased—tends to be more profitable than offering the individual products priced separately, and tends to closely approximate the profits from mixed bundling. These findings are based on an array of numerical experiments covering a broad range of demand and cost scenarios, as well as an empirical analysis of the pricing problem for an 8-product firm (a theater company).
A Bayesian Mixed Logit-Probit Model for Multinomial Choice ∗
, 2008
"... In this paper we introduce a new flexible mixed model for multinomial discrete choice where the key individual- and alternative-specific parameters of interest are allowed to follow an assumptionfree nonparametric density specification while other alternative-specific coefficients are assumed to be ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we introduce a new flexible mixed model for multinomial discrete choice where the key individual- and alternative-specific parameters of interest are allowed to follow an assumptionfree nonparametric density specification while other alternative-specific coefficients are assumed to be drawn from a multivariate normal distribution which eliminates the independence of irrelevant alternatives assumption at the individual level. A hierarchical specification of our model allows us to break down a complex data structure into a set of submodels with the desired features that are naturally assembled in the original system. We estimate the model using a Bayesian Markov Chain Monte Carlo technique with a multivariate Dirichlet Process (DP) prior on the coefficients with nonparametrically estimated density. We bypass a problem of prior non-conjugacy by employing a ”latent class ” sampling algorithm for the DP prior. The model is applied to supermarket choices of a panel of Houston households whose shopping behavior was observed over a 24-month period in years 2004-2005. We estimate the nonparametric density of two key variables of interest: the price of a basket of goods based on scanner data, and driving distance to the supermarket based on their respective locations, calculated using GPS software. Supermarket indicator variables form the parametric part of our model.

