Results 1  10
of
23
Text Classification from Labeled and Unlabeled Documents using EM
 Machine Learning
, 1999
"... . This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large qua ..."
Abstract

Cited by 804 (17 self)
 Add to MetaCart
. This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of ExpectationMaximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve ...
Using Unlabeled Data to Improve Text Classification
, 2001
"... One key difficulty with text classification learning algorithms is that they require many handlabeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create high ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
One key difficulty with text classification learning algorithms is that they require many handlabeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create highaccuracy text classifiers. By assuming that documents are created by a parametric generative model, ExpectationMaximization (EM) finds local maximum a posteriori models and classifiers from all the data  labeled and unlabeled. These generative models do not capture all the intricacies of text; however on some domains this technique substantially improves classification accuracy, especially when labeled data are sparse. Two problems arise from this basic approach. First, unlabeled data can hurt performance in domains where the generative modeling assumptions are too strongly violated. In this case the assumptions can be made more representative in two ways: by modeling subtopic class structure, and by modeling supertopic hierarchical class relationships. By doing so, model probability and classification accuracy come into correspondence, allowing unlabeled data to improve classification performance. The second problem is that even with a representative model, the improvements given by unlabeled data do not sufficiently compensate for a paucity of labeled data. Here, limited labeled data provide EM initializations that lead to lowprobability models. Performance can be significantly improved by using active learning to select highquality initializations, and by using alternatives to EM that avoid lowprobability local maxima.
Dirichlet Prior Sieves in Finite Normal Mixtures
 Statistica Sinica
, 2002
"... Abstract: The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consistency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Abstract: The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consistency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive use of the popular uniform Dirichlet prior leads to an inconsistent posterior. However, a simple adjustment to the parameters in the prior induces a random probability measure that approximates the Dirichlet process and yields a posterior that is strongly consistent for the density and weakly consistent for the unknown mixing distribution. The dimension of the resulting sieve can be selected easily in practice and a simple and efficient Gibbs sampler can be used to sample the posterior of the mixing distribution. Key words and phrases: BoseEinstein distribution, Dirichlet process, identification, method of sieves, random probability measure, relative entropy, weak convergence.
Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information
 JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
, 2000
"... ..."
The Likelihood Ratio Test for Homogeneity in the Finite Mixture Models
, 2001
"... The authors study the asymptotic behaviour of the likelihood ratio statistic for testing homogeneity in the finite mixture models of a general parametric distribution family. They prove that the limiting distribution of this statistic is the squared supremum of a truncated standard Gaussian process. ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
The authors study the asymptotic behaviour of the likelihood ratio statistic for testing homogeneity in the finite mixture models of a general parametric distribution family. They prove that the limiting distribution of this statistic is the squared supremum of a truncated standard Gaussian process. The autocorrelation function of the Gaussian process is explicitly presented. A resampling procedure is recommended to obtain the asymptotic pvalue. Three kernel functions, normal, binomial and Poisson, are used in a simulation study which illustrates the procedure.
Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... ..."
Rates Of Convergence For The Gaussian Mixture Sieve
 The Annals of Statistics
, 2000
"... Gaussian mixtures provide a convenient method of density estimation that lies somewhere between parametric models and kernel... ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Gaussian mixtures provide a convenient method of density estimation that lies somewhere between parametric models and kernel...
Semiparametric estimation of a twocomponent mixture model
 Annals of Statistics
, 2006
"... Suppose that univariate data are drawn from a mixture of two distributions that are equal up to a shift parameter. Such a model is known to be nonidentifiable from a nonparametric viewpoint. However, if we assume that the unknown mixed distribution is symmetric, we obtain the identifiability of this ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
Suppose that univariate data are drawn from a mixture of two distributions that are equal up to a shift parameter. Such a model is known to be nonidentifiable from a nonparametric viewpoint. However, if we assume that the unknown mixed distribution is symmetric, we obtain the identifiability of this model, which is then defined by four unknown parameters: the mixing proportion, two location parameters and the cumulative distribution function of the symmetric mixed distribution. We propose estimators for these four parameters when no training data is available. Our estimators are shown to be strongly consistent under mild regularity assumptions and their convergence rates are studied. Their finitesample properties are illustrated by a Monte Carlo study and our method is applied to real data.
Testing for a Finite Mixture Model With Two Components
 Journal of the Royal Statistical Society, Ser. B
, 2004
"... We consider a finite mixture model with k components and a kernel distribution from a general parametric family. We consider the problem of testing the hypothesis k = 2 against k ≥ 3. In this problem, the likelihood ratio test has a very complicated large sample theory and is difficult to use in pra ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
We consider a finite mixture model with k components and a kernel distribution from a general parametric family. We consider the problem of testing the hypothesis k = 2 against k ≥ 3. In this problem, the likelihood ratio test has a very complicated large sample theory and is difficult to use in practice. We propose a test based on the likelihood ratio statistic where the estimates of the parameters, (under the null and the alternative) are obtained from a penalized likelihood which guarantees consistent estimation of the support points. The asymptotic null distribution of the corresponding modified likelihood ratio test is derived and found to be relatively simple in nature and easily applied. Simulations based on a mixture model with normal kernel are encouraging that the modified test performs well, and its use is illustrated in an example involving data from a medical study where the hypothesis arises as a consequence of a potential genetic mechanism. Key words and phrases. Asymptotic distribution, finite mixture models, likelihood ratio tests, penalty terms, nonregular estimation, strong identifiability. AMS 1980 subject classifications. Primary 62F03; secondary 62F05. 1
Identifiability of Finite Linear Regression Mixtures
, 1996
"... Identifiability is a necessary condition for the existence of consistent estimates for the parameters of mixture models. In this paper the identifiability of finite mixtures of linear regression models with Normal errors is investigated. Three different models are treated: Mixture models with random ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Identifiability is a necessary condition for the existence of consistent estimates for the parameters of mixture models. In this paper the identifiability of finite mixtures of linear regression models with Normal errors is investigated. Three different models are treated: Mixture models with random and fixed independent variables and a model with fixed partition of the data to the mixture components. Sometimes only parts of the unknown parameter values are of interest. "Partial identifiability" is introduced for this purpose. It turns out that identifiability of finite linear regression mixtures depends on the number of p \Gamma 1dimensional hyperplanes which one needs to cover the independent variables. Counterexamples and sufficient conditions for identifiability are given for all models. 1 Introduction In general, a stochastic identifiability problem can be explained as follows: Definition 1.1 (Identifiability) Let\Omega be an arbitrary parameter space, P be some space of distr...