Results 21 - 30
of
1,758
An interior-point method for large-scale l1-regularized logistic regression
- Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract
-
Cited by 77 (3 self)
- Add to MetaCart
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Simultaneous Routing and Resource Allocation via Dual Decomposition
, 2004
"... In wireless data networks the optimal routing of data depends on the link capacities which, in turn, are determined by the allocation of communications resources (such as transmit powers and bandwidths) to the links. The optimal performance of the network can only be achieved by simultaneous optimi ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
In wireless data networks the optimal routing of data depends on the link capacities which, in turn, are determined by the allocation of communications resources (such as transmit powers and bandwidths) to the links. The optimal performance of the network can only be achieved by simultaneous optimization of routing and resource allocation. In this paper, we formulate the simultaneous routing and resource allocation problem and exploit problem structure to derive ef£cient solution methods. We use a capacitated multicommodity flow model to describe the data ¤ows in the network. We assume that the capacity of a wireless link is a concave and increasing function of the communications resources allocated to the link, and the communications resources for groups of links are limited. These assumptions allow us to formulate the simultaneous routing and resource allocation problem as a convex optimization problem over the network flow variables and the communications variables. These two sets of variables are coupled only through the link capacity constraints. We exploit this separable structure by dual decomposition. The resulting solution method attains the optimal coordination of data routing in the network layer and resource allocation in the radio control layer via pricing on the link capacities.
Convexity, Classification, and Risk Bounds
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2003
"... Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficien ..."
Abstract
-
Cited by 73 (11 self)
- Add to MetaCart
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we
Just relax: Convex programming methods for subset selection and sparse approximation
, 2004
"... Abstract. Subset selection and sparse approximation problems request a good approximation of an input signal using a linear combination of elementary signals, yet they stipulate that the approximation may only involve a few of the elementary signals. This class of problems arises throughout electric ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
Abstract. Subset selection and sparse approximation problems request a good approximation of an input signal using a linear combination of elementary signals, yet they stipulate that the approximation may only involve a few of the elementary signals. This class of problems arises throughout electrical engineering, applied mathematics and statistics, but small theoretical progress has been made over the last fifty years. Subset selection and sparse approximation both admit natural convex relaxations, but the literature contains few results on the behavior of these relaxations for general input signals. This report demonstrates that the solution of the convex program frequently coincides with the solution of the original approximation problem. The proofs depend essentially on geometric properties of the ensemble of elementary signals. The results are powerful because sparse approximation problems are combinatorial, while convex programs can be solved in polynomial time with standard software. Comparable new results for a greedy algorithm, Orthogonal Matching Pursuit, are also stated. This report should have a major practical impact because the theory applies immediately to many real-world signal processing problems. 1.
On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming
- IEEE J. SELECT. AREAS COMMUN
, 2006
"... Although the capacity of multiple-input/multiple-output (MIMO) broadcast channels (BCs) can be achieved by dirty paper coding (DPC), it is difficult to implement in practical systems. This paper investigates if, for a large number of users, simpler schemes can achieve the same performance. Specifica ..."
Abstract
-
Cited by 64 (5 self)
- Add to MetaCart
Although the capacity of multiple-input/multiple-output (MIMO) broadcast channels (BCs) can be achieved by dirty paper coding (DPC), it is difficult to implement in practical systems. This paper investigates if, for a large number of users, simpler schemes can achieve the same performance. Specifically, we show that a zero-forcing beamforming (ZFBF) strategy, while generally suboptimal, can achieve the same asymptotic sum capacity as that of DPC, as the number of users goes to infinity. In proving this asymptotic result, we provide an algorithm for determining which users should be active under ZFBF. These users are semiorthogonal to one another and can be grouped for simultaneous transmission to enhance the throughput of scheduling algorithms. Based on the user grouping, we propose and compare two fair scheduling schemes in round-robin ZFBF and proportional-fair ZFBF. We provide numerical results to confirm the optimality of ZFBF and to compare the performance of ZFBF and proposed fair scheduling schemes with that of various MIMO BC strategies.
Convex multi-task feature learning
- Machine Learning
, 2007
"... Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known singletask 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We p ..."
Abstract
-
Cited by 63 (6 self)
- Add to MetaCart
Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known singletask 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns task-specific functions and in the latter step it learns common-across-tasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks 3.
Consequences and Limits of Nonlocal Strategies
, 2010
"... Thispaperinvestigatesthepowersandlimitationsofquantum entanglementinthecontext of cooperative games of incomplete information. We give several examples of such nonlocal games where strategies that make use of entanglement outperform all possible classical strategies. One implication ofthese examples ..."
Abstract
-
Cited by 61 (15 self)
- Add to MetaCart
Thispaperinvestigatesthepowersandlimitationsofquantum entanglementinthecontext of cooperative games of incomplete information. We give several examples of such nonlocal games where strategies that make use of entanglement outperform all possible classical strategies. One implication ofthese examplesis that entanglement canprofoundly affectthesoundness property of two-prover interactive proof systems. We then establish limits on the probability with which strategies making use of entanglement can win restricted types of nonlocal games. These upperbounds mayberegardedasgeneralizationsof Tsirelson-typeinequalities, which place bounds on the extent to which quantum information can allow for the violation of Bell inequalities. We also investigate the amount of entanglement required by optimal and nearly optimal quantum strategies forsome games.
Grouped and hierarchical model selection through composite absolute penalties
- Annals of Statistics
, 2006
"... Extracting useful information from high-dimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimiza-tion has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
Extracting useful information from high-dimensional data is an important part of the focus of today’s statistical research and practice. Penalized loss function minimiza-tion has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1-penalized L2 minimization method Lasso has been popular in regression models. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penal-ties (CAP) family which allows the grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and com-bining the properties of norm penalties at the across group and within group levels. Grouped selection occurs for non-overlapping groups. In that case, we give a Bayesian 1 interpretation for CAP penalties. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. In the computation aspect, we propose using the BLASSO and cross-validation to obtain CAP estimates. For a subfamily of CAP estimates involving only the L1 and L ∞ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived allowing the regularization parameter to be selected without cross-validation. CAP is shown to im-prove on the predictive performance of the LASSO in a series of simulated experiments including cases with p>> n and mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments. 1
Model selection and estimation in the Gaussian graphical model
- BIOMETRIKA (2007), PP. 1–17
, 2007
"... ..."
Joint congestion control and media access control design for ad hoc wireless networks
- in Proceedings of IEEE Infocom
, 2005
"... Abstract — We present a model for the joint design of congestion control and media access control (MAC) for ad hoc wireless networks. Using contention graph and contention matrix, we formulate resource allocation in the network as a utility maximization problem with constraints that arise from conte ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Abstract — We present a model for the joint design of congestion control and media access control (MAC) for ad hoc wireless networks. Using contention graph and contention matrix, we formulate resource allocation in the network as a utility maximization problem with constraints that arise from contention for channel access. We present two algorithms that are not only distributed spatially, but more interestingly, they decompose vertically into two protocol layers where TCP and MAC jointly solve the system problem. The first is a primal algorithm where the MAC layer at the links generates congestion (contention) prices based on local aggregate source rates, and TCP sources adjust their rates based on the aggregate prices in their paths. The second is a dual subgradient algorithm where the MAC sub-algorithm is implemented through scheduling linklayer flows according to the congestion prices of the links. Global convergence properties of these algorithms are proved. This is a preliminary step towards a systematic approach to jointly design TCP congestion control algorithms and MAC algorithms, not only to improve performance, but more importantly, to make their interaction more transparent.

