Results 1  10
of
752
An extension on ―statistical comparisons of classifiers over multiple data sets‖ for all pairwise comparisons
 Journal of Machine Learning Research
"... In a recently published paper in JMLR, Demˇsar (2006) recommends a set of nonparametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic ..."
Abstract

Cited by 75 (17 self)
 Add to MetaCart
(Show Context)
In a recently published paper in JMLR, Demˇsar (2006) recommends a set of nonparametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic procedures and some of the most advanced ones when comparing a control method. However, it does not deal with some advanced topics in depth. Regarding these topics, we focus on more powerful proposals of statistical procedures for comparing n×n classifiers. Moreover, we illustrate an easy way of obtaining adjusted and comparable pvalues in multiple comparison procedures.
Trust region Newton method for largescale logistic regression
 In Proceedings of the 24th International Conference on Machine Learning (ICML
, 2007
"... Largescale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the loglikelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract

Cited by 69 (12 self)
 Add to MetaCart
(Show Context)
Largescale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the loglikelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1
Sparse Online Learning via Truncated Gradient
"... We propose a general method called truncated gradient to induce sparsity in the weights of onlinelearning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to ..."
Abstract

Cited by 65 (1 self)
 Add to MetaCart
(Show Context)
We propose a general method called truncated gradient to induce sparsity in the weights of onlinelearning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1regularization method in the batch setting. We prove small rates of sparsification result in only small additional regret with respect to typical onlinelearning guarantees. Finally, the approach works well empirically. We apply it to several datasets and find for datasets with large numbers of features, substantial sparsity is discoverable. 1
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
, 2010
"... Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common s ..."
Abstract

Cited by 63 (1 self)
 Add to MetaCart
(Show Context)
Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common subgradient approaches are oblivious to the characteristics of the data being observed. We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradientbased learning. The adaptation, in essence, allows us to find needles in haystacks in the form of very predictive but rarely seenfeatures. Ourparadigmstemsfromrecentadvancesinstochasticoptimizationandonlinelearning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. In a companion paper, we validate experimentally our theoretical analysis and show that the adaptive subgradient approach outperforms stateoftheart, but nonadaptive, subgradient algorithms. 1
On smoothing and inference for topic models
 In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence
, 2009
"... Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling highdimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and ..."
Abstract

Cited by 60 (7 self)
 Add to MetaCart
(Show Context)
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling highdimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents. 1
Fast Support Vector Machine Training and Classification
 on Graphics Processors, Proc. 25th Int. Conf. Machine Learning
, 2008
"... Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an ad ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
(Show Context)
Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection heuristic, which achieves speedups of 935 × over LIBSVM running on a traditional processor. We also present a GPUbased system for SVM classification which achieves speedups of 81138 × over LIBSVM (524 × over our own CPU based SVM classifier). 1.
Conditional random fields for activity recognition
 In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007
, 2007
"... of any sponsoring institution, the U.S. government or any other entity. ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
(Show Context)
of any sponsoring institution, the U.S. government or any other entity.
New Ensemble Methods For Evolving Data Streams
"... Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues. Whe ..."
Abstract

Cited by 43 (12 self)
 Add to MetaCart
Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues. When tackling nonstationary concepts, ensembles of classifiers have several advantages over single classifier methods: they are easy to scale and parallelize, they can adapt to change quickly by pruning underperforming parts of the ensemble, and they therefore usually also generate more accurate concept descriptions. This paper proposes a new experimental data stream framework for studying concept drift, and two new variants of Bagging: ADWIN Bagging and AdaptiveSize Hoeffding Tree (ASHT) Bagging. Using the new experimental framework, an evaluation study on synthetic and realworld datasets comprising up to ten million examples shows that the new ensemble methods perform very well compared to several known methods.
Empirical Bernstein Stopping
"... Sampling is a popular way of scaling up machine learning algorithms to large datasets. The question often is how many samples are needed. Adaptive stopping algorithms monitor the performance in an online fashion and they can stop early, saving valuable resources. We consider problems where probabili ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
(Show Context)
Sampling is a popular way of scaling up machine learning algorithms to large datasets. The question often is how many samples are needed. Adaptive stopping algorithms monitor the performance in an online fashion and they can stop early, saving valuable resources. We consider problems where probabilistic guarantees are desired and demonstrate how recentlyintroduced empirical Bernstein bounds can be used to design stopping rules that are efficient. We provide upper bounds on the sample complexity of the new rules, as well as empirical results on model selection and boosting in the filtering setting. 1.
Attacks on privacy and de finetti’s theorem
 In SIGMOD
, 2009
"... In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti’s theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization s ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti’s theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization scheme that is vulnerable to this attack. In fact, any scheme that uses the random worlds model, i.i.d. model, or tupleindependent model needs to be reevaluated. The difference between the attack presented here and others that have been proposed in the past is that we do not need extensive background knowledge. An attacker only needs to know the nonsensitive attributes of one individual in the data, and can carry out this attack just by building a machine learning model over the sanitized data. The reason this attack is successful is that it exploits a subtle flaw in the way prior work computed the probability of disclosure of a sensitive attribute. We demonstrate this theoretically, empirically, and with intuitive examples. We also discuss how this generalizes to many other privacy schemes.