Results 1  10
of
26
Probability Estimates for Multiclass Classification by Pairwise Coupling
 Journal of Machine Learning Research
, 2003
"... Pairwise coupling is a popular multiclass classification method that combines together all pairwise comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement. ..."
Abstract

Cited by 187 (1 self)
 Add to MetaCart
Pairwise coupling is a popular multiclass classification method that combines together all pairwise comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement.
A tutorial on MM algorithms
 Amer. Statist
, 2004
"... Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the loglikelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to part of the standard toolkit of professional statisticians. The current article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation. Key words and phrases: constrained optimization, EM algorithm, majorization, minorization, NewtonRaphson 1 1
Active exploration for learning rankings from clickthrough data
 In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2007
"... We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. In contrast, we show that an active exploration strategy can provide data that leads to much faster learning. Specificall ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. In contrast, we show that an active exploration strategy can provide data that leads to much faster learning. Specifically, we develop a Bayesian approach for selecting rankings to present users so that interations result in more informative training data. Our results using the TREC10 Web corpus, as well as synthetic data, demonstrate that a directed exploration strategy quickly leads to users being presented improved rankings in an online learning setting. We find that active exploration substantially outperforms passive observation and random exploration.
Computing Elo Ratings of Move Patterns in the Game of Go
"... Move patterns are an essential method to incorporate domain knowledge into Goplaying programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
Move patterns are an essential method to incorporate domain knowledge into Goplaying programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a victory of a team of pattern features. Elo ratings of individual pattern features are computed from these victories, and can be used in previously unseen positions to compute a probability distribution over legal moves. In this approach, several pattern features may be combined, without an exponential cost in the number of features. Despite a very small number of training games (652), this algorithm outperforms most previous patternlearning algorithms, both in terms of mean logevidence (−2.69), and prediction rate (34.9%). A 19 × 19 MonteCarlo program improved with these patterns reached the level of the strongest classical programs.
Bayesian inference for PlackettLuce ranking models
"... This paper gives an efficient Bayesian method for inferring the parameters of a PlackettLuce ranking model. Such models are parameterised distributions over rankings of a finite set of objects, and have typically been studied and applied within the psychometric, sociometric and econometric literatu ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
This paper gives an efficient Bayesian method for inferring the parameters of a PlackettLuce ranking model. Such models are parameterised distributions over rankings of a finite set of objects, and have typically been studied and applied within the psychometric, sociometric and econometric literature. The inference scheme is an application of Power EP (expectation propagation). The scheme is robust and can be readily applied to large scale data sets. The inference algorithm extends to variations of the basic PlackettLuce model, including partial rankings. We show a number of advantages of the EP approach over the traditional maximum likelihood method. We apply the method to aggregate rankings of NASCAR racing drivers over the 2002 season, and also to rankings of movie genres. 1.
Analysis of Irish thirdlevel college applications data
, 2006
"... The Irish college admissions system involves prospective students listing up to ten courses in order of preference on their application. Places in third level educational institutions are subsequently offered to the applicants on the basis of both their preferences and their final second level exami ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
The Irish college admissions system involves prospective students listing up to ten courses in order of preference on their application. Places in third level educational institutions are subsequently offered to the applicants on the basis of both their preferences and their final second level examination results. The college applications system is a large area of public debate in Ireland. Detractors suggest the process creates artificial demand for ‘high profile ’ courses, causing applicants to ignore their vocational callings. Supporters argue that the system is impartial and transparent. The Irish college degree applications data from the year 2000 is analyzed using mixture models based on ranked data models to investigate the types of application behavior exhibited by college applicants. The results of this analysis show that applicants form groups according to both the discipline and geographical location of their course choices. In addition, there is evidence of the suggested ‘points race ’ for high profile courses. Finally, gender emerges as an influential factor when studying course choice behavior.
Supplement to “A mixture of experts model for rank data with applications in election studies
, 2008
"... A voting bloc is defined to be a group of voters who have similar voting preferences. The cleavage of the Irish electorate into voting blocs is of interest. Irish elections employ a “single transferable vote” electoral system; under this system voters rank some or all of the electoral candidates in ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
A voting bloc is defined to be a group of voters who have similar voting preferences. The cleavage of the Irish electorate into voting blocs is of interest. Irish elections employ a “single transferable vote” electoral system; under this system voters rank some or all of the electoral candidates in order of preference. These rank votes provide a rich source of preference information from which inferences about the composition of the electorate may be drawn. Additionally, the influence of social factors or covariates on the electorate composition is of interest. A mixture of experts model is a mixture model in which the model parameters are functions of covariates. A mixture of experts model for rank data is developed to provide a modelbased method to cluster Irish voters into voting blocs, to examine the influence of social factors on this clustering and to examine the characteristic preferences of the voting blocs. The Benter model for rank data is employed as the
Random Utility Theory for Social Choice
"... Random utility theory models an agent’s preferences on alternatives by drawing a realvalued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Pla ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Random utility theory models an agent’s preferences on alternatives by drawing a realvalued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the PlackettLuce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MCEM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both realworld and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including PlackettLuce. 1
WholeHistory Rating: A Bayesian Rating System for Players of TimeVarying Strength
"... Abstract. WholeHistory Rating (WHR) is a new method to estimate the timevarying strengths of players involved in paired comparisons. Like many variations of the Elo rating system, the wholehistory approach is based on the dynamic BradleyTerry model. But, instead of using incremental approximatio ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. WholeHistory Rating (WHR) is a new method to estimate the timevarying strengths of players involved in paired comparisons. Like many variations of the Elo rating system, the wholehistory approach is based on the dynamic BradleyTerry model. But, instead of using incremental approximations, WHR directly computes the exact maximum a posteriori over the whole rating history of all players. This additional accuracy comes at a higher computational cost than traditional methods, but computation is still fast enough to be easily applied in real time to largescale game servers (a new game is added in less than 0.001 second). Experiments demonstrate that, in comparison to Elo, Glicko, TrueSkill, and decayedhistory algorithms, WHR produces better predictions. 1
A BradleyTerry Artificial Neural Network Model for Individual Ratings in Group Competitions
, 2006
"... A common statistical model for paired comparisons is the BradleyTerry model. This research reparameterizes the BradleyTerry model as a singlelayer artificial neural network (ANN) and shows how it can be fitted using the delta rule. The ANN model is appealing because it makes using and extending ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
A common statistical model for paired comparisons is the BradleyTerry model. This research reparameterizes the BradleyTerry model as a singlelayer artificial neural network (ANN) and shows how it can be fitted using the delta rule. The ANN model is appealing because it makes using and extending the BradleyTerry model accessible to a broader community. It also leads to natural incremental and iterative updating methods. Several extensions are presented that allow the ANN model to learn to predict the outcome of complex, uneven twoteam group competitions by rating individual players—no other published model currently does this. An incrementallearning BradleyTerry ANN yields a probability estimate within less than 5 % of the actual value training over 3,379 multiplayer online matches of a popular teamand objectivebased firstperson shooter. Keywords: BradleyTerry model, paired comparisons, neural networks, delta rule, probability estimates