Results 1 - 10
of
18
The Interplay of Optimization and Machine Learning Research
- Journal of Machine Learning Research
, 2006
"... The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embra ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, second-order cone, semidefinite, and semi-infinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done off-line, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to well-explored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques.
Theory and applications of Robust Optimization
, 2007
"... In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most pr ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
In this paper we survey the primary research, both theoretical and applied, in the field of Robust Optimization (RO). Our focus will be on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying the most prominent theoretical results of RO over the past decade, we will also present some recent results linking RO to adaptable models for multi-stage decision-making problems. Finally, we will highlight successful applications of RO across a wide spectrum of domains, including, but not limited to, finance, statistics, learning, and engineering.
Robust Regression and Lasso
"... We consider robust least-squares regression with feature-wise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to ℓ1 regularized regression (Lasso). This provides an inte ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We consider robust least-squares regression with feature-wise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to ℓ1 regularized regression (Lasso). This provides an interpretation of Lasso from a robust optimization perspective. We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems. Therefore, we provide a new methodology for designing regression algorithms, which generalize known formulations. The advantage is that robustness to disturbance is a physical property that can be exploited: in addition to obtaining new formulations, we use it directly to show sparsity properties of Lasso, as well as to prove a general consistency result for robust regression problems, including Lasso, from a unified robustness perspective. 1
Learning Algorithms using Chance-Constrained Programs
, 2007
"... I would like to express sincere gratitude and thanks to my adviser, Dr. Chiranjib Bhat-tacharyya. With his interesting thoughts and ideas, inspiring ideals and friendly nature, he made sure I was filled with enthusiasm and interest to do research all through my PhD. He was always approachable and sp ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
I would like to express sincere gratitude and thanks to my adviser, Dr. Chiranjib Bhat-tacharyya. With his interesting thoughts and ideas, inspiring ideals and friendly nature, he made sure I was filled with enthusiasm and interest to do research all through my PhD. He was always approachable and spent ample time with me and all my lab mem-bers for discussions, though he had a busy schedule. I also thank Prof. M. N. Murty, Dr. Samy Bengio (Google Labs, USA) and Prof. Aharon Ben-Tal (Technion, Israel) for their help and co-operation. I am greatly in debt to my parents, wife and other family members for supporting and encouraging me all through the PhD years. I thank all my lab members and friends, especially Karthik Raman, Sourangshu, Rashmin, Krishnan and Sivaramakrishnan, for their useful discussions and comments. I thank the Department of Science and Technology, India, for supporting me finan-cially during the PhD work. I would also like to take this opportunity to thank all the people who directly and indirectly helped in finishing my thesis. i Publications based on this Thesis
Boosting with Incomplete Information
"... In real-world machine learning problems, it is very common that part of the input feature vector is incomplete: either not available, missing, or corrupted. In this paper, we present a boosting approach that integrates features with incomplete information and those with complete information to form ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In real-world machine learning problems, it is very common that part of the input feature vector is incomplete: either not available, missing, or corrupted. In this paper, we present a boosting approach that integrates features with incomplete information and those with complete information to form a strong classifier. By introducing hidden variables to model missing information, we form loss functions that combine fully labeled data with partially labeled data to effectively learn normalized and unnormalized models. The primal problems of the proposed optimization problems with these loss functions are provided to show their close relationship and the motivations behind them. We use auxiliary functions to bound the change of the loss functions and derive explicit parameter update rules for the learning algorithms. We demonstrate encouraging results on two real-world problems — visual object recognition in computer vision and named entity recognition in natural language processing — to show the effectiveness of the proposed boosting approach. 1.
Maximum Relative Margin and Data-Dependent regularization
- JOURNAL OF MACHINE LEARNING RESEARCH
"... Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensit ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Leading classification methods such as support vector machines (SVMs) and their counterparts achieve strong generalization performance by maximizing the margin of separation between data classes. While the maximum margin approach has achieved promising performance, this article identifies its sensitivity to affine transformations of the data and to directions with large data spread. Maximum margin solutions may be misled by the spread of data and preferentially separate classes along large spread directions. This article corrects these weaknesses by measuring margin not in the absolute sense but rather only relative to the spread of data in any projection direction. Maximum relative margin corresponds to a data-dependent regularization on the classification function while maximum absolute margin corresponds to an ℓ2 norm constraint on the classification function. Interestingly, the proposed improvements only require simple extensions to existing maximum margin formulations and preserve the computational efficiency of SVMs. Through the maximization of relative margin, surprising performance gains are achieved on real-world problems such as digit, image histogram, and text classification. In addition, risk bounds are derived for the new formulation based on Rademacher averages.
Chance-Constrained Programs for Link Prediction
"... In this paper, we consider the link prediction problem, where we are given a partial snapshot of a network at some time and the goal is to predict additional links at a later time. The accuracy of the current prediction methods is quite low due to the extreme class skew and the large number of poten ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we consider the link prediction problem, where we are given a partial snapshot of a network at some time and the goal is to predict additional links at a later time. The accuracy of the current prediction methods is quite low due to the extreme class skew and the large number of potential links. In this paper, we describe learning algorithms based on chance constrained programs and show that they exhibit all the properties needed for a good link predictor, namely, allow preferential bias to positive or negative class; handle skewness in the data; and scale to large networks. Our experimental results on three real-world coauthorship networks show significant improvement in prediction accuracy over baseline algorithms. 1
Learning Algorithms for Link Prediction Based on Chance Constraints
"... In this paper, we consider the link prediction problem, where we are given a partial snapshot of a network at some time and the goal is to predict the additional links formed at a later time. The accuracy of current prediction methods is quite low due to the extreme class skew and the large number ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we consider the link prediction problem, where we are given a partial snapshot of a network at some time and the goal is to predict the additional links formed at a later time. The accuracy of current prediction methods is quite low due to the extreme class skew and the large number of potential links. Here, we describe learning algorithms based on chance constrained programs and show that they exhibit all the properties needed for a good link predictor, namely, they allow preferential bias to positive or negative class; handle skewness in the data; and scale to large networks. Our experimental results on three real-world domains—co-authorship networks, biological networks and citation networks—show significant performance improvement over baseline algorithms. We conclude by briefly describing some promising future directions based on this work.
Learning from Incomplete Data with Infinite Imputations
"... We address the problem of learning decision functions from training data in which some attribute values are unobserved. This problem can arise, for instance, when training data is aggregated from multiple sources, and some sources record only a subset of attributes. We derive a generic joint optimiz ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We address the problem of learning decision functions from training data in which some attribute values are unobserved. This problem can arise, for instance, when training data is aggregated from multiple sources, and some sources record only a subset of attributes. We derive a generic joint optimization problem in which the distribution governing the missing values is a free parameter. We show that the optimal solution concentrates the density mass on finitely many imputations, and provide a corresponding algorithm for learning from incomplete data. We report on empirical results on benchmark data, and on the email spam application that motivates our work. 1.
Robustness, Risk & Regularization in SVMs Robustness, Risk, and Regularization in Support Vector Machines
"... We consider two new formulations for classification problems in the spirit of support vector machines based on robust optimization. Our formulations are designed to build in protection to noise and control overfitting, but without being overly conservative. Our first formulation allows the noise bet ..."
Abstract
- Add to MetaCart
We consider two new formulations for classification problems in the spirit of support vector machines based on robust optimization. Our formulations are designed to build in protection to noise and control overfitting, but without being overly conservative. Our first formulation allows the noise between different samples to be correlated. We show that the standard norm-regularized support vector machine classifier is a solution to a special case of our first formulation, thus providing an explicit link between regularization and robustness in pattern classification. Our second formulation is based on a softer version of robust optimization called comprehensive robustness. We show that this formulation is equivalent to regularization by any arbitrary convex regularizer, thus extending our first equivalence result. Moreover, we explain how the connection of comprehensive robustness to convex risk-measures can be used to design risk-measure constrained classifiers with robustness to the input distribution. Our formulations result in convex optimization problems that can be easily solved. Finally, we provide some empirical results that show the promise of comprehensive robust classifiers. Keywords:

