Results 1  10
of
1,109
Mean shift: A robust approach toward feature space analysis
 In PAMI
, 2002
"... A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence ..."
Abstract

Cited by 2357 (37 self)
 Add to MetaCart
A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and thus its utility in detecting the modes of the density. The equivalence of the mean shift procedure to the Nadaraya–Watson estimator from kernel regression and the robust Mestimators of location is also established. Algorithms for two lowlevel vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 827 (3 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Estimating the Support of a HighDimensional Distribution
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propo ..."
Abstract

Cited by 765 (29 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified between 0 and 1. We propose a method to approach this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a preliminary theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled d...
Maxmargin Markov networks
, 2003
"... In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ..."
Abstract

Cited by 587 (15 self)
 Add to MetaCart
In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use highdimensional feature spaces, and from their strong theoretical guarantees. However, many realworld tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernelbased methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle highdimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M 3) networks incorporate both kernels, which efficiently deal with highdimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M 3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1
Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization
, 2007
"... The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative ..."
Abstract

Cited by 548 (20 self)
 Add to MetaCart
(Show Context)
The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NPhard, because it contains vector cardinality minimization as a special case. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum rank solution can be recovered by solving a convex optimization problem, namely the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is sufficiently large. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this preexisting concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to solving the norm minimization relaxations, and illustrate our results with numerical examples.
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
 Journal of Artificial Intelligence Research
, 2000
"... This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. Th ..."
Abstract

Cited by 435 (6 self)
 Add to MetaCart
(Show Context)
This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semanticsas a subroutine hierarchyand a declarative semanticsas a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consisten...
Fast linear iterations for distributed averaging.
 Systems & Control Letters,
, 2004
"... Abstract We consider the problem of finding a linear iteration that yields distributed averaging consensus over a network, i.e., that asymptotically computes the average of some initial values given at the nodes. When the iteration is assumed symmetric, the problem of finding the fastest converging ..."
Abstract

Cited by 421 (12 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of finding a linear iteration that yields distributed averaging consensus over a network, i.e., that asymptotically computes the average of some initial values given at the nodes. When the iteration is assumed symmetric, the problem of finding the fastest converging linear iteration can be cast as a semidefinite program, and therefore efficiently and globally solved. These optimal linear iterations are often substantially faster than several common heuristics that are based on the Laplacian of the associated graph. We show how problem structure can be exploited to speed up interiorpoint methods for solving the fastest distributed linear iteration problem, for networks with up to a thousand or so edges. We also describe a simple subgradient method that handles far larger problems, with up to one hundred thousand edges. We give several extensions and variations on the basic problem.
AntNet: Distributed stigmergetic control for communications networks
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1998
"... This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents co ..."
Abstract

Cited by 333 (29 self)
 Add to MetaCart
(Show Context)
This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurrently explore the network and exchange collected information. The communication among the agents is indirect and asynchronous, mediated by the network itself. This form of communication is typical of social insects and is called stigmergy. We compare our algorithm with six stateoftheart routing algorithms coming from the telecommunications and machine learning elds. The algorithms' performance is evaluated over a set of realistic testbeds. We run many experiments over real and artificial IP datagram networks with increasing number of nodes and under several paradigmatic spatial and temporal traffic distributions. Results are very encouraging. AntNet showed superior performance under all the experimental conditions with respect to its competitors. We analyze the main characteristics of the algorithm and try to explain the reasons for its superiority.
Efficient power control via pricing in wireless data networks
 IEEE Trans. on Commun
, 2002
"... Abstract—A major challenge in the operation of wireless communications systems is the efficient use of radio resources. One important component of radio resource management is power control, which has been studied extensively in the context of voice communications. With the increasing demand for wir ..."
Abstract

Cited by 329 (8 self)
 Add to MetaCart
Abstract—A major challenge in the operation of wireless communications systems is the efficient use of radio resources. One important component of radio resource management is power control, which has been studied extensively in the context of voice communications. With the increasing demand for wireless data services, it is necessary to establish power control algorithms for information sources other than voice. We present a power control solution for wireless data in the analytical setting of a game theoretic framework. In this context, the quality of service (QoS) a wireless terminal receives is referred to as the utility and distributed power control is a noncooperative power control game where users maximize their utility. The outcome of the game results in a Nash equilibrium that is inefficient. We introduce pricing of transmit powers in order to obtain Pareto improvement of the noncooperative power control game, i.e., to obtain improvements in user utilities relative to the case with no pricing. Specifically, we consider a pricing function that is a linear function of the transmit power. The simplicity of the pricing function allows a distributed implementation where the price can be broadcast by the base station to all the terminals. We see that pricing is especially helpful in a heavily loaded system. Index Terms—Game theory, Pareto efficiency, power control, pricing, wireless data. I.