Results 1  10
of
32
Text Classification from Labeled and Unlabeled Documents using EM
 MACHINE LEARNING
, 1999
"... This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large qua ..."
Abstract

Cited by 859 (17 self)
 Add to MetaCart
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of ExpectationMaximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve ...
Using Unlabeled Data to Improve Text Classification
, 2001
"... One key difficulty with text classification learning algorithms is that they require many handlabeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create high ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
One key difficulty with text classification learning algorithms is that they require many handlabeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create highaccuracy text classifiers. By assuming that documents are created by a parametric generative model, ExpectationMaximization (EM) finds local maximum a posteriori models and classifiers from all the data  labeled and unlabeled. These generative models do not capture all the intricacies of text; however on some domains this technique substantially improves classification accuracy, especially when labeled data are sparse. Two problems arise from this basic approach. First, unlabeled data can hurt performance in domains where the generative modeling assumptions are too strongly violated. In this case the assumptions can be made more representative in two ways: by modeling subtopic class structure, and by modeling supertopic hierarchical class relationships. By doing so, model probability and classification accuracy come into correspondence, allowing unlabeled data to improve classification performance. The second problem is that even with a representative model, the improvements given by unlabeled data do not sufficiently compensate for a paucity of labeled data. Here, limited labeled data provide EM initializations that lead to lowprobability models. Performance can be significantly improved by using active learning to select highquality initializations, and by using alternatives to EM that avoid lowprobability local maxima.
Dirichlet Prior Sieves in Finite Normal Mixtures
 Statistica Sinica
, 2002
"... Abstract: The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consistency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Abstract: The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consistency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive use of the popular uniform Dirichlet prior leads to an inconsistent posterior. However, a simple adjustment to the parameters in the prior induces a random probability measure that approximates the Dirichlet process and yields a posterior that is strongly consistent for the density and weakly consistent for the unknown mixing distribution. The dimension of the resulting sieve can be selected easily in practice and a simple and efficient Gibbs sampler can be used to sample the posterior of the mixing distribution. Key words and phrases: BoseEinstein distribution, Dirichlet process, identification, method of sieves, random probability measure, relative entropy, weak convergence.
The Likelihood Ratio Test for Homogeneity in the Finite Mixture Models
, 2001
"... The authors study the asymptotic behaviour of the likelihood ratio statistic for testing homogeneity in the finite mixture models of a general parametric distribution family. They prove that the limiting distribution of this statistic is the squared supremum of a truncated standard Gaussian process. ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
The authors study the asymptotic behaviour of the likelihood ratio statistic for testing homogeneity in the finite mixture models of a general parametric distribution family. They prove that the limiting distribution of this statistic is the squared supremum of a truncated standard Gaussian process. The autocorrelation function of the Gaussian process is explicitly presented. A resampling procedure is recommended to obtain the asymptotic pvalue. Three kernel functions, normal, binomial and Poisson, are used in a simulation study which illustrates the procedure.
Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information
 JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
, 2000
"... ..."
Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... ..."
Semiparametric estimation of a twocomponent mixture model
 Annals of Statistics
, 2006
"... Suppose that univariate data are drawn from a mixture of two distributions that are equal up to a shift parameter. Such a model is known to be nonidentifiable from a nonparametric viewpoint. However, if we assume that the unknown mixed distribution is symmetric, we obtain the identifiability of this ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
Suppose that univariate data are drawn from a mixture of two distributions that are equal up to a shift parameter. Such a model is known to be nonidentifiable from a nonparametric viewpoint. However, if we assume that the unknown mixed distribution is symmetric, we obtain the identifiability of this model, which is then defined by four unknown parameters: the mixing proportion, two location parameters and the cumulative distribution function of the symmetric mixed distribution. We propose estimators for these four parameters when no training data is available. Our estimators are shown to be strongly consistent under mild regularity assumptions and their convergence rates are studied. Their finitesample properties are illustrated by a Monte Carlo study and our method is applied to real data.
Testing for a Finite Mixture Model With Two Components
 Journal of the Royal Statistical Society, Ser. B
, 2004
"... We consider a finite mixture model with k components and a kernel distribution from a general parametric family. We consider the problem of testing the hypothesis k = 2 against k ≥ 3. In this problem, the likelihood ratio test has a very complicated large sample theory and is difficult to use in pra ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We consider a finite mixture model with k components and a kernel distribution from a general parametric family. We consider the problem of testing the hypothesis k = 2 against k ≥ 3. In this problem, the likelihood ratio test has a very complicated large sample theory and is difficult to use in practice. We propose a test based on the likelihood ratio statistic where the estimates of the parameters, (under the null and the alternative) are obtained from a penalized likelihood which guarantees consistent estimation of the support points. The asymptotic null distribution of the corresponding modified likelihood ratio test is derived and found to be relatively simple in nature and easily applied. Simulations based on a mixture model with normal kernel are encouraging that the modified test performs well, and its use is illustrated in an example involving data from a medical study where the hypothesis arises as a consequence of a potential genetic mechanism. Key words and phrases. Asymptotic distribution, finite mixture models, likelihood ratio tests, penalty terms, nonregular estimation, strong identifiability. AMS 1980 subject classifications. Primary 62F03; secondary 62F05. 1
Hypothesis testing in mixture regression models
 Department of Mathematics and Statistics, York University
, 2004
"... As a technical supplement to Zhu and Zhang (2004), we give detailed information on how to establish asymptotic theory for both maximum likelihood estimate and maximum modified likelihood estimate in mixture regression models. Under specific and reasonable conditions, we show that the optimal converg ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
As a technical supplement to Zhu and Zhang (2004), we give detailed information on how to establish asymptotic theory for both maximum likelihood estimate and maximum modified likelihood estimate in mixture regression models. Under specific and reasonable conditions, we show that the optimal convergence rate of n− 1 4 for estimating the mixing distribution is achievable for both the maximum likelihood and maximum modified likelihood estimates. We also derive the asymptotic distributions of the two loglikelihood ratio testing statistics for testing homogeneity. 1 1 Notation and Assumptions We consider a random sample of n independent observations {yi, Xi}n1 with the following density function pi(yi,xi;ω) = [(1 − α)fi(yi,xi; β, µ1) + αfi(yi,xi; β, µ2)]gi(xi), (1) where gi(xi) is the distribution function of Xi, ω = (α, β, µ1, µ2) is the unknown parameter vector, in which β (q1 × 1) measures the strength of association contributed by the covariate terms and the two q2 × 1 vectors, µ1 and µ2, represent the different contributions from two different groups. The loglikelihood function Ln(ω) is given by Ln(ω) = n∑ i=1 log[(1 − α)fi(β, µ1)/fi ∗ + αfi(β, µ2)/fi∗], (2) where fi ∗ = fi(yi,xi; β∗, µ∗) and fi(yi,xi; β, µ) = fi(β, µ1). In light of the symmetry for α, without loss of generality, we only consider α ∈ [0, 0.5] only. Define the parametric space Ω as Ω = {ω: α ∈ [0, 0.5], β ∈ B, µ1  ≤M, µ2  ≤M} = [0, 0.5] × B ×B(0,M)×B(0,M), (3) where M is a large positive scalar such that µ∗  < M, B(0,M) is a ball in Rq2 centered at 0 with radius M, and B is a subset of Rq1. One of the key hypotheses involving mixture models is whether the mixture regression is warranted. In family studies, it means whether or not the trait of interest is familial. This hypothesis can be stated as follows: