Results 1  10
of
541
An introduction to variable and feature selection
 Journal of Machine Learning Research
, 2003
"... Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. ..."
Abstract

Cited by 1283 (16 self)
 Add to MetaCart
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available.
Distance Metric Learning, With Application To Clustering With SideInformation
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 15
, 2003
"... Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as Kmeans initially fails to find one that is meaningful to a user, the only recourse may be for the us ..."
Abstract

Cited by 799 (14 self)
 Add to MetaCart
(Show Context)
Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as Kmeans initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider "similar." For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in R , learns a distance metric over R that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, localoptimafree algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 441 (59 self)
 Add to MetaCart
(Show Context)
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroidbased parametric clustering approaches, such as classical kmeans and informationtheoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by ratedistortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 400 (0 self)
 Add to MetaCart
(Show Context)
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
InformationTheoretic CoClustering
 In KDD
, 2003
"... Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views ..."
Abstract

Cited by 342 (12 self)
 Add to MetaCart
(Show Context)
Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the coclustering problem as an optimization problem in information theory  the optimal coclustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters.
Object categorization by learned universal visual dictionary
 In ICCV
, 2005
"... Figure 1: Exemplar snapshots of our interactive object categorization demo application. A user selects (sloppily) a region of interest and our algorithm associates an object class label with it. Despite large differences in pose, size, illumination and visual appearance the correct class label (e.g. ..."
Abstract

Cited by 303 (8 self)
 Add to MetaCart
(Show Context)
Figure 1: Exemplar snapshots of our interactive object categorization demo application. A user selects (sloppily) a region of interest and our algorithm associates an object class label with it. Despite large differences in pose, size, illumination and visual appearance the correct class label (e.g. cow, building, car...) is automatically associated with each selected object instance. Some of these test images were downloaded from the web and none were part of the training set. A video of the interactive demo may be found at the above web site. This paper presents a new algorithm for the automatic recognition of object classes from images (categorization). Compact and yet discriminative appearancebased object class models are automatically learned from a set of training images. The method is simple and extremely fast, making it suitable for many applications such as semantic image retrieval, web search, and interactive image editing. It classifies a region according to the proportions of different visual words (clusters in feature space). The specific visual words and the typical proportions in each object are learned from a segmented training set. The main contribution of this paper is two fold: i) an optimally compact visual dictionary is learned by pairwise merging of visual words from an initially large dictionary. The final visual words are described by GMMs. ii) A novel statistical measure of discrimination is proposed which is optimized by each merge operation. High classification accuracy is demonstrated for nine object classes on photographs of real objects viewed under general lighting conditions, poses and viewpoints. The set of test images used for validation comprise: i) photographs acquired by us, ii) images from the web and iii) images from the recently released Pascal dataset. The proposed algorithm performs well on both texturerich objects (e.g. grass, sky, trees) and structurerich ones (e.g. cars, bikes, planes). 1.
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as we ..."
Abstract

Cited by 197 (3 self)
 Add to MetaCart
(Show Context)
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
Learning Distance Functions Using Equivalence Relations
 In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... We address the problem of learning distance metrics using sideinformation in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and e#cient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). ..."
Abstract

Cited by 173 (6 self)
 Add to MetaCart
We address the problem of learning distance metrics using sideinformation in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and e#cient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002).
Learning human action via information maximization, CVPR
, 2008
"... In this paper, we present a novel approach for automatically learning a compact and yet discriminative appearancebased human action model. A video sequence is represented by a bag of spatiotemporal features called videowords by quantizing the extracted 3D interest points (cuboids) from the videos. ..."
Abstract

Cited by 149 (13 self)
 Add to MetaCart
(Show Context)
In this paper, we present a novel approach for automatically learning a compact and yet discriminative appearancebased human action model. A video sequence is represented by a bag of spatiotemporal features called videowords by quantizing the extracted 3D interest points (cuboids) from the videos. Our proposed approach is able to automatically discover the optimal number of videoword clusters by utilizing Maximization of Mutual Information(MMI). Unlike the kmeans algorithm, which is typically used to cluster spatiotemporal cuboids into video words based on their appearance similarity, MMI clustering further groups the videowords, which are highly correlated to some group of actions. To capture the structural information of the learnt optimal videoword clusters, we explore the correlation of the compact videoword clusters. We use the modified correlgoram, which is not only translation and rotation invariant, but also somewhat scale invariant. We extensively test our proposed approach on two publicly available challenging datasets: the KTH dataset and IXMAS multiview dataset. To the best of our knowledge, we are the first to try the bag of videowords related approach on the multiview dataset. We have obtained very impressive results on both datasets. 1.