Results 1  10
of
20
Learning mixtures of product distributions over discrete domains
 SIAM J. Comput
"... Abstract. We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. [Proceedings of the 26th Annual Symposium on Theory of Computing (STOC), Montréal, QC, 1994, ACM, New York, pp. 273–282]. We give a ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
Abstract. We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. [Proceedings of the 26th Annual Symposium on Theory of Computing (STOC), Montréal, QC, 1994, ACM, New York, pp. 273–282]. We give a poly(n/ɛ)time algorithm for learning a mixture of k arbitrary product distributions over the ndimensional Boolean cube {0, 1} n to accuracy ɛ, for any constant k. Previous polynomialtime algorithms could achieve this only for k = 2 product distributions; our result answers an open question stated independently in [M. Cryan, Learning and Approximation Algorithms for Problems Motivated by Evolutionary Trees, Ph.D. thesis, University of Warwick
Linear Concepts and Hidden Variables
, 2000
"... We study a learning problem which allows for a \fair" comparison between unsupervised learning methodsprobabilistic model construction, and more traditional algorithms that directly learn a classication. The merits of each approach are intuitively clear: inducing a model is more expensive c ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
We study a learning problem which allows for a \fair" comparison between unsupervised learning methodsprobabilistic model construction, and more traditional algorithms that directly learn a classication. The merits of each approach are intuitively clear: inducing a model is more expensive computationally, but may support a wider range of predictions. Its performance, however, will depend on how well the postulated probabilistic model ts that data. To compare the paradigms we consider a model which postulates a single binaryvalued hidden variable on which all other attributes depend. In this model, nding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn the model with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to nd a good l...
Learning Mixtures of Product Distributions using Correlations and Independence
"... We study the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = {D1,...DT}, and � mixing weights, {w1,..., wT} such that i wi = 1. A sample from a mixture is generated by choosing i with probability wi ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We study the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = {D1,...DT}, and � mixing weights, {w1,..., wT} such that i wi = 1. A sample from a mixture is generated by choosing i with probability wi and then choosing a sample from distribution Di. The problem of learning the mixture is that of finding the parameters of the distributions comprising D, given only the ability to sample from the mixture. In this paper, we restrict ourselves to learning mixtures of product distributions. The key to learning the mixtures is to find a few vectors, such that points from different distributions are sharply separated upon projection onto these vectors. Previous techniques use the vectors corresponding to the top few directions of highest variance of the mixture. Unfortunately, these directions may be directions of high noise and not directions along which the distributions are separated. Further, skewed mixing weights amplify the effects of noise, and as a result, previous techniques only work when the separation between the input distributions is large relative to the imbalance in the mixing weights. In this paper, we show an algorithm which successfully learns mixtures of distributions with a separation condition that depends only logarithmically on the skewed mixing weights. In particular, it succeeds for a separation between the centers that is Θ(σ √ T log Λ), where σ is the maximum directional standard deviation of any distribution in the mixture, T is the number of distributions, and Λ is polynomial in T, σ, log n and the imbalance in the mixing
LEARNING MIXTURES OF SEPARATED Nonspherical Gaussians
, 2005
"... Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a localsearch heuristic from Dempster, Laird and Rubin [J. Roy. Statis ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a localsearch heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1–38]. These do not provably run in polynomial time. We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a “separation condition” which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are “distance concentration” results—proved using isoperimetric inequalities— which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture. We also formalize the more general problem of maxlikelihood fit of a Gaussian mixture to unstructured data.
Some DiscriminantBased PAC Algorithms
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... A classical approach in multiclass pattern classification is the following. Estimate the probability distributions that generated the observations for each label class, and then label new instances by applying the Bayes classifier to the estimated distributions. That approach provides more useful ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
A classical approach in multiclass pattern classification is the following. Estimate the probability distributions that generated the observations for each label class, and then label new instances by applying the Bayes classifier to the estimated distributions. That approach provides more useful information than just a class label; it also provides estimates of the conditional distribution of class labels, in situations where there is class overlap. We would
When Can Two Unsupervised Learners Achieve PAC Separation?
 PAC Separation? Procs. of COLT/EUROCOLT, LNAI 2111
, 2001
"... . In this paper we study a new restriction of the PAC learning framework, in which each label class is handled by an unsupervised learner that aims to t an appropriate probability distribution to its own data. A hypothesis is derived by choosing, for any unlabeled instance, the label whose distr ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
. In this paper we study a new restriction of the PAC learning framework, in which each label class is handled by an unsupervised learner that aims to t an appropriate probability distribution to its own data. A hypothesis is derived by choosing, for any unlabeled instance, the label whose distribution assigns it the higher likelihood. The motivation for the new learning setting is that the general approach of tting separate distributions to each label class, is often used in practice for classication problems. The set of probability distributions that is obtained is more useful than a collection of decision boundaries. A question that arises, however, is whether it is ever more tractable (in terms of computational complexity or samplesize required) to nd a simple decision boundary than to divide the problem up into separate unsupervised learning problems and nd appropriate distributions. Within the framework, we give algorithms for learning various simple geometric concept classes. In the boolean domain we show how to learn parity functions, and functions having a constant upper bound on the number of relevant attributes. These results distinguish the new setting from various other wellknown restrictions of PAClearning. We give an algorithm for learning monomials over input vectors generated by an unknown product distribution. The main open problem is whether monomials (or any other concept class) distinguish learnability in this framework from standard PAClearnability. 1
Robust pca and clustering on noisy mixtures
 in Proc. of SODA
, 2009
"... This paper presents a polynomial algorithm for learning mixtures of logconcave distributions in R n in the presence of malicious noise. That is, each sample is corrupted with some small probability, being replaced by a point about which we can make no assumptions. A key element of the algorithm is R ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper presents a polynomial algorithm for learning mixtures of logconcave distributions in R n in the presence of malicious noise. That is, each sample is corrupted with some small probability, being replaced by a point about which we can make no assumptions. A key element of the algorithm is Robust Principle Components Analysis (PCA), which is less susceptible to corruption by noisy points. While noise may cause standard PCA to collapse wellseparated mixture components so that they are indistinguishable, Robust PCA preserves the distance between some of the components, making a partition possible. It then recurses on each half of the mixture until every component is isolated. The success of this algorithm requires only a O ∗ (log n) factor increase in the required separation between components of the mixture compared to the noiseless case. 1
Learning and Approximation Algorithms for problems motivated by Evolutionary Trees
, 1999
"... vi Chapter 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Models and Methods . . . . . . ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
vi Chapter 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Biological Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Models and Methods . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Learning in the General Markov Model . . . . . . . . . . . . . . . 15 1.3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.2 Learning Problems for Evolutionary Trees . . . . . . . . . 19 1.4 Layout of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 2 Learning TwoState Markov Evolutionary Trees 28 2.1 Previous research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.1 The General Idea . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.2 Previous work on learning the distribution . . . . . . . . . 34 2.1.3 Previous work on finding the topology . . . . . . . . . . . . 39 ii 2.1.4 Re...
Learning Mixtures of Arbitrary Distributions over Large Discrete Domains
"... We give an algorithm for learning a mixture of unstructured distributions. This problem arises in various unsupervised learning scenarios, for example in learning topic models from a corpus of documents spanning several topics. We show how to learn the constituents (the topic distributions and the m ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We give an algorithm for learning a mixture of unstructured distributions. This problem arises in various unsupervised learning scenarios, for example in learning topic models from a corpus of documents spanning several topics. We show how to learn the constituents (the topic distributions and the mixture weights) of a mixture of k (constant) arbitrary distributions over a large discrete domain [n] = {1, 2,..., n}, using O(n polylog n) samples. This task is informationtheoretically impossible for k> 1 under the usual sampling process from a mixture distribution. However, there are situations (such as the abovementioned topic model case) in which each sample point consists of several observations from the same mixture constituent. This number of observations, which we call the “sampling aperture”, is a crucial parameter of the problem. We show that efficient learning is possible exactly at the informationtheoretically leastpossible aperture of 2k − 1. (Independent work by others places certain restrictions on the model, which enables learning with smaller aperture, albeit using, in general, a significantly larger sample size.) A sequence of tools contribute to the algorithm, such as concentration results for random matrices, dimension reduction, moment estimations, and sensitivity analysis. 1
Incomplete statistical information fusion and its application to clinical trials data
 In Scalable Uncertainty Management (SUM’07), volume 4772 of LNCS
, 2007
"... Abstract. In medical clinical trials, overall trial results are highlighted in the abstracts of papers/reports. These results are summaries of underlying statistical analysis where most of the time normal distributions are assumed in the analysis. It is common for clinicians to focus on the informat ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. In medical clinical trials, overall trial results are highlighted in the abstracts of papers/reports. These results are summaries of underlying statistical analysis where most of the time normal distributions are assumed in the analysis. It is common for clinicians to focus on the information in the abstracts in order to review or integrate several clinical trial results that address the same or similar medical question(s). Therefore, developing techniques to merge results from clinical trials based on information in the abstracts is useful and important. In reality information in an abstract can either provide sufficient details about a normal distribution or just partial information about a distribution. In this paper, we first propose approaches to constructing normal distributions from both complete and incomplete statistical information in the abstracts. We then provide methods to merge these normal distributions (or sampling distributions). Following this, we investigate the conditions under which two normal distributions can be merged. Finally, we design an algorithm to sequence the merging of trials results to ensure that the most reliable trials are considered first.