Results 1 - 10
of
21
Probability Estimates for Multi-class Classification by Pairwise Coupling
- Journal of Machine Learning Research
, 2003
"... Pairwise coupling is a popular multi-class classification method that combines together all pairwise comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement. ..."
Abstract
-
Cited by 115 (1 self)
- Add to MetaCart
Pairwise coupling is a popular multi-class classification method that combines together all pairwise comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement.
A tutorial on MM algorithms
- Amer. Statist
, 2004
"... Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the loglikelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to part of the standard toolkit of professional statisticians. The current article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation. Key words and phrases: constrained optimization, EM algorithm, majorization, minorization, Newton-Raphson 1 1
Active exploration for learning rankings from clickthrough data
- In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2007
"... We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. In contrast, we show that an active exploration strategy can provide data that leads to much faster learning. Specificall ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. In contrast, we show that an active exploration strategy can provide data that leads to much faster learning. Specifically, we develop a Bayesian approach for selecting rankings to present users so that interations result in more informative training data. Our results using the TREC-10 Web corpus, as well as synthetic data, demonstrate that a directed exploration strategy quickly leads to users being presented improved rankings in an online learning setting. We find that active exploration substantially outperforms passive observation and random exploration.
Computing Elo Ratings of Move Patterns in the Game of Go
"... Move patterns are an essential method to incorporate domain knowledge into Go-playing programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Move patterns are an essential method to incorporate domain knowledge into Go-playing programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a victory of a team of pattern features. Elo ratings of individual pattern features are computed from these victories, and can be used in previously unseen positions to compute a probability distribution over legal moves. In this approach, several pattern features may be combined, without an exponential cost in the number of features. Despite a very small number of training games (652), this algorithm outperforms most previous pattern-learning algorithms, both in terms of mean log-evidence (−2.69), and prediction rate (34.9%). A 19 × 19 Monte-Carlo program improved with these patterns reached the level of the strongest classical programs.
Bayesian inference for Plackett-Luce ranking models
"... This paper gives an efficient Bayesian method for inferring the parameters of a Plackett-Luce ranking model. Such models are parameterised distributions over rankings of a finite set of objects, and have typically been studied and applied within the psychometric, sociometric and econometric literatu ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper gives an efficient Bayesian method for inferring the parameters of a Plackett-Luce ranking model. Such models are parameterised distributions over rankings of a finite set of objects, and have typically been studied and applied within the psychometric, sociometric and econometric literature. The inference scheme is an application of Power EP (expectation propagation). The scheme is robust and can be readily applied to large scale data sets. The inference algorithm extends to variations of the basic Plackett-Luce model, including partial rankings. We show a number of advantages of the EP approach over the traditional maximum likelihood method. We apply the method to aggregate rankings of NASCAR racing drivers over the 2002 season, and also to rankings of movie genres. 1.
Analysis of Irish third-level college applications data
, 2006
"... The Irish college admissions system involves prospective students listing up to ten courses in order of preference on their application. Places in third level educational institutions are subsequently offered to the applicants on the basis of both their preferences and their final second level exami ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
The Irish college admissions system involves prospective students listing up to ten courses in order of preference on their application. Places in third level educational institutions are subsequently offered to the applicants on the basis of both their preferences and their final second level examination results. The college applications system is a large area of public debate in Ireland. Detractors suggest the process creates artificial demand for ‘high profile ’ courses, causing applicants to ignore their vocational callings. Supporters argue that the system is impartial and transpar-ent. The Irish college degree applications data from the year 2000 is analyzed using mixture models based on ranked data models to investigate the types of application behavior exhibited by college applicants. The results of this analysis show that applicants form groups according to both the discipline and geographical location of their course choices. In addition, there is evidence of the suggested ‘points race ’ for high profile courses. Finally, gender emerges as an influential factor when studying course choice behavior.
Supplement to “A mixture of experts model for rank data with applications in election studies
, 2008
"... A voting bloc is defined to be a group of voters who have similar voting preferences. The cleavage of the Irish electorate into voting blocs is of interest. Irish elections employ a “single transferable vote” electoral system; under this system voters rank some or all of the electoral candidates in ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A voting bloc is defined to be a group of voters who have similar voting preferences. The cleavage of the Irish electorate into voting blocs is of interest. Irish elections employ a “single transferable vote” electoral system; under this system voters rank some or all of the electoral candidates in order of preference. These rank votes provide a rich source of preference information from which inferences about the composition of the electorate may be drawn. Additionally, the influence of social factors or covariates on the electorate composition is of interest. A mixture of experts model is a mixture model in which the model parameters are functions of covariates. A mixture of experts model for rank data is developed to provide a model-based method to cluster Irish voters into voting blocs, to examine the influence of social factors on this clustering and to examine the characteristic preferences of the voting blocs. The Benter model for rank data is employed as the
Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength
"... Abstract. Whole-History Rating (WHR) is a new method to estimate the time-varying strengths of players involved in paired comparisons. Like many variations of the Elo rating system, the whole-history approach is based on the dynamic Bradley-Terry model. But, instead of using incremental approximatio ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Whole-History Rating (WHR) is a new method to estimate the time-varying strengths of players involved in paired comparisons. Like many variations of the Elo rating system, the whole-history approach is based on the dynamic Bradley-Terry model. But, instead of using incremental approximations, WHR directly computes the exact maximum a posteriori over the whole rating history of all players. This additional accuracy comes at a higher computational cost than traditional methods, but computation is still fast enough to be easily applied in real time to large-scale game servers (a new game is added in less than 0.001 second). Experiments demonstrate that, in comparison to Elo, Glicko, TrueSkill, and decayed-history algorithms, WHR produces better predictions. 1
CMU Blizzard 2008: Optimally using a large database for unit selection synthesis.
"... This paper describes CMU’s entry for the Blizzard Challenge 2008. Our eventual system consisted of a fairly conventional layered cluster based unit selection system using the most predictable subset of the whole UK speech databases. This paper describes the methods we used to find the most reliable ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes CMU’s entry for the Blizzard Challenge 2008. Our eventual system consisted of a fairly conventional layered cluster based unit selection system using the most predictable subset of the whole UK speech databases. This paper describes the methods we used to find the most reliable subset and the techniques used to optimize the selection. An additional technique that was used was to automatically detect “hard ” text and modify the phrasing algorithm accordingly. Although this technique was targeted at SUS utterances it was in place for all utterances. CMU’s entry is letter M in the results.
A Bradley-Terry Artificial Neural Network Model for Individual Ratings in Group Competitions
, 2006
"... A common statistical model for paired comparisons is the Bradley-Terry model. This research re-parameterizes the Bradley-Terry model as a single-layer artificial neural network (ANN) and shows how it can be fitted using the delta rule. The ANN model is appealing because it makes using and extending ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A common statistical model for paired comparisons is the Bradley-Terry model. This research re-parameterizes the Bradley-Terry model as a single-layer artificial neural network (ANN) and shows how it can be fitted using the delta rule. The ANN model is appealing because it makes using and extending the Bradley-Terry model accessible to a broader community. It also leads to natural incremental and iterative updating methods. Several extensions are presented that allow the ANN model to learn to predict the outcome of complex, uneven two-team group competitions by rating individual players—no other published model currently does this. An incremental-learning Bradley-Terry ANN yields a probability estimate within less than 5 % of the actual value training over 3,379 multiplayer online matches of a popular teamand objective-based first-person shooter. Keywords: Bradley-Terry model, paired comparisons, neural networks, delta rule, probability estimates

