Results 1 
7 of
7
Generating a Diverse Set of HighQuality Clusterings
"... Abstract. We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many highquality partitions, and then grouping these partitions to obtain k representatives. The decompositio ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many highquality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of representative partitions. 1
EXPLORING THE LANDSCAPE OF CLUSTERINGS: TOWARDS BETTER INTEGRATION OF DATA, CLUSTERINGS AND THE USER
"... Data size explosion is a huge challenge in data analysis. These days, there are serious limitations on the kind of questions one can ask about the data since we lack the processing power and storage that are required for analyzing these humongous datasets. The past decade has seen a tremendous growt ..."
Abstract
 Add to MetaCart
Data size explosion is a huge challenge in data analysis. These days, there are serious limitations on the kind of questions one can ask about the data since we lack the processing power and storage that are required for analyzing these humongous datasets. The past decade has seen a tremendous growth in the internet and thus, data creation and consumption. With the rapid increase in the number of researchers and businesses
Sensor Network Localization for Moving Sensors
"... Abstract—Sensor network localization (SNL) is the problem of determining the locations of the sensors given sparse and usually noisy intercommunication distances among them. In this work we propose an iterative algorithm named PLACEMENT to solve the SNL problem. This iterative algorithm requires an ..."
Abstract
 Add to MetaCart
Abstract—Sensor network localization (SNL) is the problem of determining the locations of the sensors given sparse and usually noisy intercommunication distances among them. In this work we propose an iterative algorithm named PLACEMENT to solve the SNL problem. This iterative algorithm requires an initial estimation of the locations and in each iteration, is guaranteed to reduce the cost function. The proposed algorithm is able to take advantage of the good initial estimation of sensor locations making it suitable for localizing moving sensors, and also suitable for the refinement of the results produced by other algorithms. Our algorithm is very scalable. We have experimented with a variety of sensor networks and have shown that the proposed algorithm outperforms existing algorithms both in terms of speed and accuracy in almost all experiments. Our algorithm can embed 120,000 sensors in less than 20 minutes. KeywordsEmbedding, sensor network localization; I.
Geometric Methods in Machine Learning
"... The standard goal of machine learning to take a finite set of data and induce a model using that data that is able to generalize beyond that finite set. In particular, a learning problem finds an appropriate statistical model from a model space based on the training data from a data space. For many ..."
Abstract
 Add to MetaCart
The standard goal of machine learning to take a finite set of data and induce a model using that data that is able to generalize beyond that finite set. In particular, a learning problem finds an appropriate statistical model from a model space based on the training data from a data space. For many such problems, these spaces carry geometric structures that can be exploited using geometric methods, or the problems themselves can be formulated in a way that naturally appeals to geometrybased methods. In such cases, studying these geometric structures and then using appropriate geometrydriven methods not only gives insight into existing algorithms, but also helps build new and better algorithms. In my research, I apply geometric methods to a variety of learning problems, and provide strong theoretical and empirical evidence in favor of using them. The first part of my proposal is devoted to the study of the geometry of the space of probabilistic models associated with the statistical process that generated the data. This study – based on the theory well grounded in information geometry – allows me to reason about the appropriateness of conjugate priors from a geometric perspective, and hence gain insight into the large number of existing models that rely on these priors. Furthermore, I use this study to build a family of kernels called generative kernels that can be used as offtheshelf tool in any kernel learning method such as support vector machines. Preliminary experiments of generative kernels based on simple statistical process show promising results, and in the future I propose to extend this work for more complex statistical
Methods of Artificial Intelligence
, 2013
"... Visualizing the effects of a changing distance using continuous embeddings ..."
Abstract
 Add to MetaCart
Visualizing the effects of a changing distance using continuous embeddings
JOHNSONLINDENSTRAUSS DIMENSIONALITY REDUCTION ON THE SIMPLEX
"... Abstract. We propose an algorithm for dimensionality reduction on the simplex, mapping a set of highdimensional distributions to a space of lowerdimensional distributions, whilst approximately preserving pairwise Hellinger distance between distributions. By introducing a restriction on the input d ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We propose an algorithm for dimensionality reduction on the simplex, mapping a set of highdimensional distributions to a space of lowerdimensional distributions, whilst approximately preserving pairwise Hellinger distance between distributions. By introducing a restriction on the input data to distributions that are in some sense quite smooth, we can map n points on the dsimplex to the simplex of O(ε−2 log n) dimensions with εdistortion with high probability. The techniques used rely on a classical result by Johnson and Lindenstrauss on dimensionality reduction for Euclidean point sets and require the same number of random bits as nonsparse methods proposed by Achlioptas for databasefriendly dimensionality reduction. 1.