Results 21  30
of
3,197
Learning the discriminative powerinvariance tradeoff
 In ICCV
, 2007
"... We investigate the problem of learning optimal descriptors for a given classification task. Many handcrafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that ..."
Abstract

Cited by 149 (4 self)
 Add to MetaCart
We investigate the problem of learning optimal descriptors for a given classification task. Many handcrafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this tradeoff must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domainspecific kernel as a combination of base kernels corresponding to base features which achieve different levels of tradeoff (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve stateoftheart performance on the UIUC textures, Oxford flowers and Caltech 101 datasets. 1.
Signal recovery from partial information via Orthogonal Matching Pursuit.” Submitted to
 IEEE Trans. Inform. Theory
, 2005
"... Abstract. This article demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(m ln d) random linear measurements of that signal. This is a massive improvement over previou ..."
Abstract

Cited by 149 (8 self)
 Add to MetaCart
Abstract. This article demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(m ln d) random linear measurements of that signal. This is a massive improvement over previous results for OMP, which require O(m 2) measurements. The new results for OMP are comparable with recent results for another algorithm called Basis Pursuit (BP). The OMP algorithm is much faster and much easier to implement, which makes it an attractive alternative to BP for signal recovery problems. 1.
Convex multitask feature learning
 Machine Learning
, 2007
"... Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We p ..."
Abstract

Cited by 139 (15 self)
 Add to MetaCart
Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks 3.
Metric Learning by Collapsing Classes
"... We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in th ..."
Abstract

Cited by 130 (2 self)
 Add to MetaCart
We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, it yields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.
Convexity, Classification, and Risk Bounds
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2003
"... Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 01 loss function. The convexity makes these algorithms computationally efficien ..."
Abstract

Cited by 122 (14 self)
 Add to MetaCart
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 01 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 01 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we
Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions
, 2004
"... In this paper, we develop a robust uncertainty principle for finite signals in C N which states that for nearly all choices T, Ω ⊂ {0,..., N − 1} such that T  + Ω  ≍ (log N) −1/2 · N, there is no signal f supported on T whose discrete Fourier transform ˆ f is supported on Ω. In fact, we can mak ..."
Abstract

Cited by 119 (12 self)
 Add to MetaCart
In this paper, we develop a robust uncertainty principle for finite signals in C N which states that for nearly all choices T, Ω ⊂ {0,..., N − 1} such that T  + Ω  ≍ (log N) −1/2 · N, there is no signal f supported on T whose discrete Fourier transform ˆ f is supported on Ω. In fact, we can make the above uncertainty principle quantitative in the sense that if f is supported on T, then only a small percentage of the energy (less than half, say) of ˆ f is concentrated on Ω. As an application of this robust uncertainty principle (QRUP), we consider the problem of decomposing a signal into a sparse superposition of spikes and complex sinusoids f(s) = � α1(t)δ(s − t) + � α2(ω)e i2πωs/N / √ N. t∈T We show that if a generic signal f has a decomposition (α1, α2) using spike and frequency locations in T and Ω respectively, and obeying ω∈Ω T  + Ω  ≤ Const · (log N) −1/2 · N, then (α1, α2) is the unique sparsest possible decomposition (all other decompositions have more nonzero terms). In addition, if T  + Ω  ≤ Const · (log N) −1 · N, then the sparsest (α1, α2) can be found by solving a convex optimization problem. Underlying our results is a new probabilistic approach which insists on finding the correct uncertainty relation or the optimally sparse solution for nearly all subsets but not necessarily all of them, and allows to considerably sharpen previously known results [9, 10]. In fact, we show that the fraction of sets (T, Ω) for which the above properties do not hold can be upper bounded by quantities like N −α for large values of α. The QRUP (and the application to finding sparse representations) can be extended to general pairs of orthogonal bases Φ1, Φ2 of C N. For nearly all choices Γ1, Γ2 ⊂ {0,..., N − 1} obeying Γ1  + Γ2  ≍ µ(Φ1, Φ2) −2 · (log N) −m, where m ≤ 6, there is no signal f such that Φ1f is supported on Γ1 and Φ2f is supported on Γ2 where µ(Φ1, Φ2) is the mutual coherence between Φ1 and Φ2.
On the optimality of multiantenna broadcast scheduling using zeroforcing beamforming
 IEEE J. SELECT. AREAS COMMUN
, 2006
"... Although the capacity of multipleinput/multipleoutput (MIMO) broadcast channels (BCs) can be achieved by dirty paper coding (DPC), it is difficult to implement in practical systems. This paper investigates if, for a large number of users, simpler schemes can achieve the same performance. Specifica ..."
Abstract

Cited by 116 (5 self)
 Add to MetaCart
Although the capacity of multipleinput/multipleoutput (MIMO) broadcast channels (BCs) can be achieved by dirty paper coding (DPC), it is difficult to implement in practical systems. This paper investigates if, for a large number of users, simpler schemes can achieve the same performance. Specifically, we show that a zeroforcing beamforming (ZFBF) strategy, while generally suboptimal, can achieve the same asymptotic sum capacity as that of DPC, as the number of users goes to infinity. In proving this asymptotic result, we provide an algorithm for determining which users should be active under ZFBF. These users are semiorthogonal to one another and can be grouped for simultaneous transmission to enhance the throughput of scheduling algorithms. Based on the user grouping, we propose and compare two fair scheduling schemes in roundrobin ZFBF and proportionalfair ZFBF. We provide numerical results to confirm the optimality of ZFBF and to compare the performance of ZFBF and proposed fair scheduling schemes with that of various MIMO BC strategies.
Model selection and estimation in the Gaussian graphical model
 BIOMETRIKA (2007), PP. 1–17
, 2007
"... ..."
Simultaneous Routing and Resource Allocation via Dual Decomposition
, 2004
"... In wireless data networks the optimal routing of data depends on the link capacities which, in turn, are determined by the allocation of communications resources (such as transmit powers and bandwidths) to the links. The optimal performance of the network can only be achieved by simultaneous optimi ..."
Abstract

Cited by 108 (4 self)
 Add to MetaCart
In wireless data networks the optimal routing of data depends on the link capacities which, in turn, are determined by the allocation of communications resources (such as transmit powers and bandwidths) to the links. The optimal performance of the network can only be achieved by simultaneous optimization of routing and resource allocation. In this paper, we formulate the simultaneous routing and resource allocation problem and exploit problem structure to derive ef£cient solution methods. We use a capacitated multicommodity flow model to describe the data ¤ows in the network. We assume that the capacity of a wireless link is a concave and increasing function of the communications resources allocated to the link, and the communications resources for groups of links are limited. These assumptions allow us to formulate the simultaneous routing and resource allocation problem as a convex optimization problem over the network flow variables and the communications variables. These two sets of variables are coupled only through the link capacity constraints. We exploit this separable structure by dual decomposition. The resulting solution method attains the optimal coordination of data routing in the network layer and resource allocation in the radio control layer via pricing on the link capacities.
Structured variable selection with sparsityinducing norms
, 904
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to ov ..."
Abstract

Cited by 97 (15 self)
 Add to MetaCart
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsityinducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1norm and the group ℓ1norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for leastsquares linear regression in low and highdimensional settings.