Results 1  10
of
1,068
Mean shift: A robust approach toward feature space analysis
 In PAMI
, 2002
"... A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence ..."
Abstract

Cited by 2230 (37 self)
 Add to MetaCart
A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and thus its utility in detecting the modes of the density. The equivalence of the mean shift procedure to the Nadaraya–Watson estimator from kernel regression and the robust Mestimators of location is also established. Algorithms for two lowlevel vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 757 (3 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 717 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Maxmargin Markov networks
, 2003
"... In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ..."
Abstract

Cited by 574 (15 self)
 Add to MetaCart
In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use highdimensional feature spaces, and from their strong theoretical guarantees. However, many realworld tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernelbased methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle highdimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M 3) networks incorporate both kernels, which efficiently deal with highdimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M 3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1
Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization
, 2007
"... The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative ..."
Abstract

Cited by 553 (23 self)
 Add to MetaCart
(Show Context)
The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NPhard, because it contains vector cardinality minimization as a special case. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is sufficiently large. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this preexisting concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
New Support Vector Algorithms
, 2000
"... this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases ..."
Abstract

Cited by 438 (42 self)
 Add to MetaCart
this article with the regression case. To explain this, we will introduce a suitable definition of a margin that is maximized in both cases
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
 Journal of Artificial Intelligence Research
, 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract

Cited by 430 (6 self)
 Add to MetaCart
(Show Context)
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semanticsas a subroutine hierarchyand a declarative semanticsas a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consisten...
AntNet: Distributed stigmergetic control for communications networks
 Journal of Artificial Intelligence Research
, 1998
"... This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents co ..."
Abstract

Cited by 319 (29 self)
 Add to MetaCart
(Show Context)
This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurrently explore the network and exchange collected information. The communication among the agents is indirect and asynchronous, mediated by the network itself. This form of communication is typical of social insects and is called stigmergy. We compare our algorithm with six stateoftheart routing algorithms coming from the telecommunications and machine learning elds. The algorithms' performance is evaluated over a set of realistic testbeds. We run many experiments over real and arti cial IP datagram networks with increasing number of nodes and under several paradigmatic spatial and temporal tra c distributions. Results are very encouraging. AntNet showed superior performance under all the experimental conditions with respect to its competitors. We analyze the main characteristics of the algorithm and try to explain the reasons for its superiority. 1.
Efficient power control via pricing in wireless data networks
 IEEE Trans. on Commun
, 2002
"... Abstract—A major challenge in the operation of wireless communications systems is the efficient use of radio resources. One important component of radio resource management is power control, which has been studied extensively in the context of voice communications. With the increasing demand for wir ..."
Abstract

Cited by 311 (8 self)
 Add to MetaCart
Abstract—A major challenge in the operation of wireless communications systems is the efficient use of radio resources. One important component of radio resource management is power control, which has been studied extensively in the context of voice communications. With the increasing demand for wireless data services, it is necessary to establish power control algorithms for information sources other than voice. We present a power control solution for wireless data in the analytical setting of a game theoretic framework. In this context, the quality of service (QoS) a wireless terminal receives is referred to as the utility and distributed power control is a noncooperative power control game where users maximize their utility. The outcome of the game results in a Nash equilibrium that is inefficient. We introduce pricing of transmit powers in order to obtain Pareto improvement of the noncooperative power control game, i.e., to obtain improvements in user utilities relative to the case with no pricing. Specifically, we consider a pricing function that is a linear function of the transmit power. The simplicity of the pricing function allows a distributed implementation where the price can be broadcast by the base station to all the terminals. We see that pricing is especially helpful in a heavily loaded system. Index Terms—Game theory, Pareto efficiency, power control, pricing, wireless data. I.
Pathwise coordinate optimization
, 2007
"... We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorith ..."
Abstract

Cited by 311 (22 self)
 Add to MetaCart
We consider “oneatatime ” coordinatewise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1penalized regression (lasso) in the lterature, but it seems to have been largely ignored. Indeed, it seems that coordinatewise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinatewise descent does not work in the “fused lasso ” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally we generalize the procedure to the twodimensional fused lasso, and demonstrate its performance on some image smoothing problems.