Results 1  10
of
17
Universal MultiDimensional Scaling
 KDD' 10
, 2010
"... In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the internals of a single subroutine in the algorithm, we can switch cost functions and target spaces ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the internals of a single subroutine in the algorithm, we can switch cost functions and target spaces easily. In addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge to better quality solutions than existing methods in comparable time. Moreover, they have a small memory footprint and scale effectively for large data sets. We expect that this framework will be useful for a number of MDS variants that have not yet been studied. Our framework extends to embedding highdimensional points lying on a sphere to points on a lower dimensional sphere, preserving geodesic distances. As a complement to this result, we also extend the JohnsonLindenstrauss Lemma to this spherical setting, by showing that projecting to a random O((1/ɛ 2) log n)dimensional sphere causes only an ɛdistortion in the geodesic distances.
A unified algorithmic framework for multidimensional scaling
"... In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the internals of a single subroutine in the algorithm, we can switch cost functions and target spaces ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the internals of a single subroutine in the algorithm, we can switch cost functions and target spaces easily. In addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge to better quality solutions than existing methods, in comparable time. We expect that this framework will be useful for a number of MDS variants that have not yet been studied. Our framework extends to embedding highdimensional points lying on a sphere to points on a lower dimensional sphere, preserving geodesic distances. As a compliment to this result, we also extend the JohnsonLindenstrauss Lemma to this spherical setting, where projecting to a random O((1/"2) logn)dimensional sphere causes "distortion. 1
Visualizing progressions for education and game design
, 2014
"... Progression design is a critical part of designing games or educational content. Currently, systems to visualize the content of a progression are limited and do not help designers answer questions important to the design process. These questions include comparing two progressions to understand th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Progression design is a critical part of designing games or educational content. Currently, systems to visualize the content of a progression are limited and do not help designers answer questions important to the design process. These questions include comparing two progressions to understand the relative order in which concepts are introduced or how complexity changes throughout the progression. We present an interactive visualization system that allows designers to compare two different progressions, using multiple views and interaction techniques that aim to help designers answer these questions. We evaluate our tool through informal anecdotes, discussing insights that were found on progression data for actively developed games.
Inverse Covariance Estimation from Data with Missing Values using the ConcaveConvex Procedure
"... AbstractWe study the problem of estimating sparse precision matrices from data with missing values. We show that the corresponding maximum likelihood problem is a Difference of Convex (DC) program by proving some new concavity results on the Schur complements. We propose a new algorithm to solve t ..."
Abstract
 Add to MetaCart
(Show Context)
AbstractWe study the problem of estimating sparse precision matrices from data with missing values. We show that the corresponding maximum likelihood problem is a Difference of Convex (DC) program by proving some new concavity results on the Schur complements. We propose a new algorithm to solve this problem based on the ConCaveConvex Procedure (CCCP), and we show that the standard EM procedure is a weaker CCCP for this problem. Numerical experiments show that our new algorithm, called mCCCP, converges much faster than EM on both synthetic and biology datasets.
Inverse Covariance Estimation from Data with Missing Values using the ConcaveConvex procedure
"... Abstract We study the problem of estimating sparse precision matrices from data with missing values. Direct statistical inference with likelihoods of observed values is a minimization program that uses the Schur complement. The objective function is not convex but rather a "difference of conve ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We study the problem of estimating sparse precision matrices from data with missing values. Direct statistical inference with likelihoods of observed values is a minimization program that uses the Schur complement. The objective function is not convex but rather a "difference of convex" program (DC program) which can be solved with the ConcaveConvex procedure (CCP). The technique presented uses a concaveconvex decomposition that is more natural than the one used in ExpectationMaximization (EM) algorithms, and simulation studies also show that the CCP compares favorably to EM. matrices and their inverses, precision matrices. The most common probability model for studying correlations in continuous data is the multivariate Gaussian distribution with mean µ and covariance matrix Σ. In the context of Gaussian distributions defined over undirected graphs, also known as Gaussian Markov Random Fields (GMRFs), the nonzero entries S ij of the precision matrix S = Σ −1 of the GMRF correspond precisely to the conditional dependencies between the variables. Promoting sparsity has compelling advantages, such as producing more robust models that generalize well to unseen data where S is the determinant of matrix S, Tr the trace operator, and S 1 = ij S ij . In practice, datasets often suffer from missing values due to mistakes in data collection, dropouts, or limitations from experimental design. Instead of using the full likelihood of the samples, we need to consider the marginal likelihood of the observed values, or observed loglikelihood for short. Inference for µ and S can be based on the observed loglikelihood if we assume that the underlying missing data mechanism is ignorable. With an arbitrary pattern of missing values, no explicit maximization of the likelihood is possible even for the mean values and covariance matrices
RESEARCH ARTICLE Open Access FragViz: visualization of fragmented networks
"... Background: Researchers in systems biology use network visualization to summarize the results of their analysis. Such networks often include unconnected components, which popular network alignment algorithms place arbitrarily with respect to the rest of the network. This can lead to misinterpretatio ..."
Abstract
 Add to MetaCart
(Show Context)
Background: Researchers in systems biology use network visualization to summarize the results of their analysis. Such networks often include unconnected components, which popular network alignment algorithms place arbitrarily with respect to the rest of the network. This can lead to misinterpretations due to the proximity of otherwise unrelated elements. Results: We propose a new network layout optimization technique called FragViz which can incorporate additional information on relations between unconnected network components. It uses a twostep approach by first arranging the nodes within each of the components and then placing the components so that their proximity in the network corresponds to their relatedness. In the experimental study with the leukemia gene networks we demonstrate that FragViz can obtain network layouts which are more interpretable and hold additional information that could not be exposed using classical network layout optimization algorithms. Conclusions: Network visualization relies on computational techniques for proper placement of objects under consideration. These algorithms need to be fast so that they can be incorporated in responsive interfaces required by the explorative data analysis environments. Our layout optimization technique FragViz meets these requirements and specifically addresses the visualization of fragmented networks, for which standard algorithms do
The dissertation Testing and Estimating Ordered Parameters of Probability Distributions
"... This introduction to the R package isotone is a (slightly) modified version of de˜Leeuw et˜al. (2009), published in the Journal of Statistical Software. In this paper we give a general framework for isotone optimization. First we discuss a generalized version of the pooladjacentviolators algorithm ..."
Abstract
 Add to MetaCart
This introduction to the R package isotone is a (slightly) modified version of de˜Leeuw et˜al. (2009), published in the Journal of Statistical Software. In this paper we give a general framework for isotone optimization. First we discuss a generalized version of the pooladjacentviolators algorithm (PAVA) to minimize a separable convex function with simple chain constraints. Besides of general convex functions we extend existing PAVA implementations in terms of observation weights, approaches for tie handling, and responses from repeated measurement designs. Since isotone optimization problems can be formulated as convex programming problems with linear constraints we then develop a primal active set method to solve such problem. This methodology is applied on specific loss functions relevant in statistics. Both approaches are implemented in the R package isotone. Keywords:˜isotone optimization, PAVA, monotone regression, active set, R. 1. Introduction: History of
STUDIES
, 2013
"... c ○ Stephen Ingram, 2013ing. To address the case of costly distances, we develop an algorithm framework, Glint, which efficiently manages the number of distance function calculations for the Multidimensional Scaling class of DR algorithms. We then show that Glint implementations of Multidimensional ..."
Abstract
 Add to MetaCart
(Show Context)
c ○ Stephen Ingram, 2013ing. To address the case of costly distances, we develop an algorithm framework, Glint, which efficiently manages the number of distance function calculations for the Multidimensional Scaling class of DR algorithms. We then show that Glint implementations of Multidimensional Scaling algorithms achieve substantial speed improvements or remove the need for human monitoring. iii Preface Parts of this thesis have appeared in publications and journal submissions. Most of Chapter 3 is based on the following published conference paper:
Geometric Methods in Machine Learning
"... The standard goal of machine learning to take a finite set of data and induce a model using that data that is able to generalize beyond that finite set. In particular, a learning problem finds an appropriate statistical model from a model space based on the training data from a data space. For many ..."
Abstract
 Add to MetaCart
The standard goal of machine learning to take a finite set of data and induce a model using that data that is able to generalize beyond that finite set. In particular, a learning problem finds an appropriate statistical model from a model space based on the training data from a data space. For many such problems, these spaces carry geometric structures that can be exploited using geometric methods, or the problems themselves can be formulated in a way that naturally appeals to geometrybased methods. In such cases, studying these geometric structures and then using appropriate geometrydriven methods not only gives insight into existing algorithms, but also helps build new and better algorithms. In my research, I apply geometric methods to a variety of learning problems, and provide strong theoretical and empirical evidence in favor of using them. The first part of my proposal is devoted to the study of the geometry of the space of probabilistic models associated with the statistical process that generated the data. This study – based on the theory well grounded in information geometry – allows me to reason about the appropriateness of conjugate priors from a geometric perspective, and hence gain insight into the large number of existing models that rely on these priors. Furthermore, I use this study to build a family of kernels called generative kernels that can be used as offtheshelf tool in any kernel learning method such as support vector machines. Preliminary experiments of generative kernels based on simple statistical process show promising results, and in the future I propose to extend this work for more complex statistical