Results 1 - 10
of
225
Clustering with Bregman Divergences
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract
-
Cited by 182 (31 self)
- Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate-distortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
Integrating Constraints and Metric Learning in Semi-Supervised Clustering
- In ICML
, 2004
"... Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has utilized supervised data in one of two approaches: 1) constraint-based methods that guide the clustering algorithm towards a better grouping of the data, and 2) distanc ..."
Abstract
-
Cited by 124 (6 self)
- Add to MetaCart
Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has utilized supervised data in one of two approaches: 1) constraint-based methods that guide the clustering algorithm towards a better grouping of the data, and 2) distance-function learning methods that adapt the underlying similarity metric used by the clustering algorithm. This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. Experimental results demonstrate that the unified approach produces better clusters than both individual approaches as well as previously proposed semisupervised clustering algorithms.
Semi-supervised Clustering by Seeding
- In Proceedings of 19th International Conference on Machine Learning (ICML-2002
, 2002
"... Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper explores the use of labeled data to generate initial seed clusters, as well as the use of constraints generated from labeled data to guide the clustering process. It intr ..."
Abstract
-
Cited by 98 (14 self)
- Add to MetaCart
Semi-supervised clustering uses a small amount of labeled data to aid and bias the clustering of unlabeled data. This paper explores the use of labeled data to generate initial seed clusters, as well as the use of constraints generated from labeled data to guide the clustering process. It introduces two semi-supervised variants of KMeans clustering that can be viewed as instances of the EM algorithm, where labeled data provides prior information about the conditional distributions of hidden category labels. Experimental results demonstrate the advantages of these methods over standard random seeding and COP-KMeans, a previously developed semi-supervised clustering algorithm.
Natural Terrain Classification using 3-D Ladar Data
, 2004
"... Because of the difficulty of interpreting laser data in a meaningful way, safe navigation in vegetated terrain is still a daunting challenge. In this paper, we focus on the segmentation of ladar data using local 3-D point statistics into three classes: clutter to capture grass and tree canopy, linea ..."
Abstract
-
Cited by 59 (6 self)
- Add to MetaCart
Because of the difficulty of interpreting laser data in a meaningful way, safe navigation in vegetated terrain is still a daunting challenge. In this paper, we focus on the segmentation of ladar data using local 3-D point statistics into three classes: clutter to capture grass and tree canopy, linear to capture thin objects like wires or tree branches, and finally surface to capture solid objects like ground terrain surface, rocks or tree trunks. We present the details of the method proposed, the modifications we made to implement it on-board an autonomous ground vehicle. Finally, we present results from field tests using this rover and results produced from different stationary laser sensors.
Simultaneous Tracking & Activity Recognition (STAR) Using Many Anonymous, Binary Sensors
, 2004
"... Automatic health monitoring helps enable independent living for the elderly by providing specific information to caregivers. This goal, called aging in place,is increasingly important as an unprecedented portion of the population enters old age. I introduce the simultaneous tracking and activity rec ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Automatic health monitoring helps enable independent living for the elderly by providing specific information to caregivers. This goal, called aging in place,is increasingly important as an unprecedented portion of the population enters old age. I introduce the simultaneous tracking and activity recognition (STAR) problem,whose solution provides this key information. I propose using data from many minimally invasive sensors commonly found in home security systems to provide simultaneous room-level tracking and recognition of many of the activities of daily living (ADLs). ADLs have been chosen by physicians to gauge the severity of cognitive and physical ailments. I describe a Rao-Blackwellised particle filter for room level tracking, rudimentary activity recognition, and data association, as well as a Monte Carlo EM approach for online parameter learning. I demonstrate results from experiments in an instrumented home and on simulated data. Proposed extensions improve the approach and add more complex activity recognition. We discuss how to integrate a growing vocabulary of activities into the tracker.
Adapted vocabularies for generic visual categorization
- In ECCV
, 2006
"... Abstract. Several state-of-the-art Generic Visual Categorization (GVC) systems are built around a vocabulary of visual terms and characterize images with one histogram of visual word counts. We propose a novel and practical approach to GVC based on a universal vocabulary, which describes the content ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
Abstract. Several state-of-the-art Generic Visual Categorization (GVC) systems are built around a vocabulary of visual terms and characterize images with one histogram of visual word counts. We propose a novel and practical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. An image is characterized by a set of histograms- one per class- where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. It is shown experimentally on three very different databases that this novel representation outperforms those approaches which characterize an image with a single histogram. 1
Study of a bus-based disruption-tolerant network: mobility modeling and impact on routing
- ACM MOBICOM
, 2007
"... We study traces taken from UMass DieselNet, a Disruption-Tolerant Network consisting of WiFi nodes attached to buses. As buses travel their routes, they encounter other buses and in some cases are able to establish pair-wise connections and transfer data between them. We analyze the bus-to-bus conta ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
We study traces taken from UMass DieselNet, a Disruption-Tolerant Network consisting of WiFi nodes attached to buses. As buses travel their routes, they encounter other buses and in some cases are able to establish pair-wise connections and transfer data between them. We analyze the bus-to-bus contact traces to characterize the contact process between buses and its impact on DTN routing performance. We find that the all-bus-pairs aggregated inter-contact times show no discernible pattern. However, the inter-contact times aggregated at a route level exhibit periodic behavior. Based on analysis of the deterministic inter-meeting times for bus pairs running on route pairs, and consideration of the variability in bus movement and the random failures to establish connections, we construct generative route-level models that capture the above behavior. Through trace-driven simulations of epidemic routing, we find that the epidemic performance predicted by traces generated with this finer-grained route-level model is much closer to the actual performance that would be realized in the operational system than traces generated using the coarse-grained all-bus-pairs aggregated model. This suggests the importance in choosing the right level of model granularity when modeling mobility-related measures such as inter-contact times in DTNs.
Automatic Person Verification Using Speech and Face Information
, 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems
Natural terrain classification using three-dimensional ladar data for ground robot mobility
- Journal of Field Robotics
, 2006
"... In recent years, much progress has been made in outdoor autonomous navigation. However, safe navigation is still a daunting challenge in terrain containing vegetation. In this paper, we focus on the segmentation of ladar data into three classes using local three-dimensional point cloud statistics. T ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
In recent years, much progress has been made in outdoor autonomous navigation. However, safe navigation is still a daunting challenge in terrain containing vegetation. In this paper, we focus on the segmentation of ladar data into three classes using local three-dimensional point cloud statistics. The classes are: ”scatter ” to represent porous volumes such as grass and tree canopy, ”linear ” to capture thin objects like wires or tree branches, and finally ”surface ” to capture solid objects like ground surface, rocks or large trunks. We present the details of the proposed method, and the modifications we made to implement it on-board an autonomous ground vehicle for real-time data processing. Finally, we present results produced from different stationary laser sensors and from field tests using an unmanned ground vehicle. 1
What HMMs can do
, 2002
"... Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabil ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial analyzes HMMs by exploring a novel way in which an HMM can be defined, namely in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no theoretical limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM for ASR, we should rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.

