Results 1  10
of
16
Clustering: Science or art
 NIPS 2009 Workshop on Clustering Theory
, 2009
"... This paper deals with the question whether the quality of different clustering algorithms can be compared by a general, scientifically sound procedure which is independent of particular clustering algorithms. In our opinion, the major obstacle is the difficulty to evaluate a clustering algorithm wit ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
This paper deals with the question whether the quality of different clustering algorithms can be compared by a general, scientifically sound procedure which is independent of particular clustering algorithms. In our opinion, the major obstacle is the difficulty to evaluate a clustering algorithm without taking into account the context: why does the user cluster his data in the first place, and what does he want to do with the clustering afterwards? We suggest that clustering should not be treated as an applicationindependent mathematical problem, but should always be studied in the context of its enduse. Different techniques to evaluate clustering algorithms have to be developed for different uses of clustering. To simplify this procedure it will be useful to build a “taxonomy of clustering problems ” to identify clustering applications which can be treated in a unified way. Preamble Every year, dozens of papers on clustering algorithms get published. Researchers continuously invent new clustering algorithms and work on improving existing ones.
Composite Multiclass Losses
"... We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity condition, Bregman representation, ordersensitivity, existence and uniqueness of the composite representation for multiclass losses. We subsume existing results on “classification calibration ” by relating it to properness and show that the simple integral representation for binary proper losses can not be extended to multiclass losses. 1
MIXABILITY IS BAYES RISK CURVATURE RELATIVE TO LOG LOSS
"... Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss one ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Given K codes, a standard result from source coding tells us how to design a single universal code with codelengths within log(K) bits of the best code, on any data sequence. Translated to the online learning setting of prediction with expert advice, this result implies that for logarithmic loss one can guarantee constant regret, which does not grow with the number of outcomes that need to be predicted. In this setting, it is known for which other losses the same guarantee can be given: these are the losses that are mixable. We show that among the mixable losses, log loss is special: in fact, one may understand the class of mixable losses as those that behave like log loss in an essential way. More specifically, a loss is mixable if and only if the curvature of its Bayes risk is at least as large as the curvature of the Bayes risk for log loss (for which the Bayes risk equals the entropy). 1.
Surrogate losses and regret bounds for costsensitive classification with exampledependent costs
, 2011
"... ..."
Proper losses for learning from partial labels
"... This paper discusses the problem of calibrating posterior class probabilities from partially labelled data. Each instance is assumed to be labelled as belonging to one of several candidate categories, at most one of them being true. We generalize the concept of proper loss to this scenario, we estab ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper discusses the problem of calibrating posterior class probabilities from partially labelled data. Each instance is assumed to be labelled as belonging to one of several candidate categories, at most one of them being true. We generalize the concept of proper loss to this scenario, we establish a necessary and sufficient condition for a loss function to be proper, and we show a direct procedure to construct a proper loss for partial labels from a conventional proper loss. The problem can be characterized by the mixing probability matrix relating the true class of the data and the observed labels. The full knowledge of this matrix is not required, and losses can be constructed that are proper for a wide set of mixing probability matrices. 1
Saliency Detection via Divergence Analysis: A Unified Perspective
"... A number of bottomup saliency detection algorithms have been proposed in the literature. Since these have been developed from intuition and principles inspired by psychophysical studies of human vision, the theoretical relations among them are unclear. In this paper, we present a unifying perspecti ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A number of bottomup saliency detection algorithms have been proposed in the literature. Since these have been developed from intuition and principles inspired by psychophysical studies of human vision, the theoretical relations among them are unclear. In this paper, we present a unifying perspective. Saliency of an image area is defined in terms of divergence between certain feature distributions estimated from the central part and its surround. We show that various, seemingly different saliency estimation algorithms are in fact closely related. We also discuss some commonly used centersurround selection strategies. Experiments with two datasets are presented to quantify the relative advantages of these algorithms. 1
Generalization Bounds for Domain Adaptation
"... In this paper, we provide a new framework to study the generalization bound of the learning process for domain adaptation. We consider two kinds of representative domain adaptation settings: one is domain adaptation with multiple sources and the other is domain adaptation combining source and target ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we provide a new framework to study the generalization bound of the learning process for domain adaptation. We consider two kinds of representative domain adaptation settings: one is domain adaptation with multiple sources and the other is domain adaptation combining source and target data. In particular, we use the integral probability metric to measure the difference between two domains. Then, we develop the specific Hoeffdingtype deviation inequality and symmetrization inequality for either kind of domain adaptation to achieve the corresponding generalization bound based on the uniform entropy number. By using the resultant generalization bound, we analyze the asymptotic convergence and the rate of convergence of the learning process for domain adaptation. Meanwhile, we discuss the factors that affect the asymptotic behavior of the learning process. The numerical experiments support our results. 1
The convexity and design of composite multiclass losses
 In Proceedings of the 29th International Conference on Machine Learning (ICML12
"... We consider composite loss functions for multiclass prediction comprising a proper (i.e., Fisherconsistent) loss over probability distributions and an inverse link function. We establish conditions for their (strong) convexity and explore the implications. We also show how the separation of concerns ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider composite loss functions for multiclass prediction comprising a proper (i.e., Fisherconsistent) loss over probability distributions and an inverse link function. We establish conditions for their (strong) convexity and explore the implications. We also show how the separation of concerns afforded by using this composite representation allows for the design of families of losses with the same Bayes risk. 1.
RiskBased Generalizations of fdivergences
"... We derive a generalized notion of fdivergences, called (f,l)divergences. We show that this generalization enjoys many of the nice properties of fdivergences, although it is a richer family. It also provides alternative definitions of standard divergences in terms of surrogate risks. As a first pr ..."
Abstract
 Add to MetaCart
We derive a generalized notion of fdivergences, called (f,l)divergences. We show that this generalization enjoys many of the nice properties of fdivergences, although it is a richer family. It also provides alternative definitions of standard divergences in terms of surrogate risks. As a first practical application of this theory, we derive a new estimator for the KulbackLeibler divergence that we use for clustering sets of vectors. 1.
On Mixture Reduction for Multiple Target Tracking
"... Abstract—In multiple hypothesis or probability hypothesis based multiple target tracking the resulting mixtures with ever growing components should be approximated by a reduced mixture. Although there are cost based and more rigorous mixture reduction algorithms, which are computationally expensive ..."
Abstract
 Add to MetaCart
Abstract—In multiple hypothesis or probability hypothesis based multiple target tracking the resulting mixtures with ever growing components should be approximated by a reduced mixture. Although there are cost based and more rigorous mixture reduction algorithms, which are computationally expensive to apply in practical situations especially in high dimensional state spaces, the mixture reduction is generally done based on ad hoc criteria and procedures. In this paper we propose a sequentially pairwise mixture reduction criterion and algorithm based on statistical decision theory. For this purpose, we choose the merging criterion for the mixture components based on a likelihood ratio test. The advantages and disadvantages of some of the previous reduction schemes and the newly proposed algorithm are discussed in detail. The results are evaluated on a Gaussian mixture implementation of the PHD filter where two different pruning and merging schemes are designed: one for computational feasibility, the other for the state extraction.