Results 1  10
of
258
Latent dirichlet allocation
 Journal of Machine Learning Research
, 2003
"... We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, ..."
Abstract

Cited by 2392 (62 self)
 Add to MetaCart
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model. 1.
Thumbs up? Sentiment Classification using Machine Learning Techniques
 IN PROCEEDINGS OF EMNLP
, 2002
"... We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform humanproduced baselines. However, the three mac ..."
Abstract

Cited by 620 (6 self)
 Add to MetaCart
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform humanproduced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topicbased categorization. We conclude by examining factors that make the sentiment classification problem more challenging. 1
Local features and kernels for classification of texture and object categories: a comprehensive study
 International Journal of Computer Vision
, 2007
"... Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a largescale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations an ..."
Abstract

Cited by 380 (24 self)
 Add to MetaCart
Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a largescale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover’s Distance and the χ 2 distance. We first evaluate the performance of our approach with different keypoint detectors and descriptors, as well as different kernels and classifiers. We then conduct a comparative evaluation with several stateoftheart recognition methods on four texture and five object databases. On most of these databases, our implementation exceeds the best reported results and achieves comparable performance on the rest. Finally, we investigate the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which groundtruth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective for classification of texture and object images under challenging realworld conditions, including significant intraclass variations and substantial background clutter.
Unsupervised NamedEntity Extraction from the Web: An Experimental Study
 ARTIFICIAL INTELLIGENCE
, 2005
"... The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domainindependent, and scalable manner. The paper presents an overview of KNOWITALL’s novel architecture and design princip ..."
Abstract

Cited by 277 (38 self)
 Add to MetaCart
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domainindependent, and scalable manner. The paper presents an overview of KNOWITALL’s novel architecture and design principles, emphasizing its distinctive ability to extract information without any handlabeled training examples. In its first major run, KNOWITALL extracted over 50,000 facts, but suggested a challenge: How can we improve KNOWITALL’s recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domainspecific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies subclasses in order to boost recall. List Extraction locates lists of class instances, learns a “wrapper ” for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL’s domainindependent methods, the methods also obviate handlabeled training examples. The paper reports on experiments, focused on namedentity extraction, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4fold to 8fold increase in recall, while maintaining high precision, and discovered over 10,000 cities missing from the Tipster Gazetteer.
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
 In Proc. 18th International Conf. on Machine Learning
, 2001
"... This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expec ..."
Abstract

Cited by 252 (2 self)
 Add to MetaCart
This paper presents an active learning method that directly optimizes expected future error. This is in contrast to many other popular techniques that instead aim to reduce version space size. These other methods are popular because for many learning models, closed form calculation of the expected future error is intractable. Our approach is made feasible by taking a sampling approach to estimating the expected reduction in error due to the labeling of a query. In experimental results on two realworld data sets we reach high accuracy very quickly, sometimes with four times fewer labeled examples than competing methods. 1.
Efficiently Inducing Features of Conditional Random Fields
, 2003
"... Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with ..."
Abstract

Cited by 182 (10 self)
 Add to MetaCart
Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with
Learning to Match and Cluster Large HighDimensional Data Sets For Data Integration
, 2002
"... Part of the process of data integration is determining which sets of identifiers refer to the same realworld entities. In integrating databases found on the Web or obtained by using information extraction methods, it is often possible to solve this problem by exploiting similarities in the textual ..."
Abstract

Cited by 131 (6 self)
 Add to MetaCart
Part of the process of data integration is determining which sets of identifiers refer to the same realworld entities. In integrating databases found on the Web or obtained by using information extraction methods, it is often possible to solve this problem by exploiting similarities in the textual names used for objects in di#erent databases. In this paper we describe techniques for clustering and matching identifier names that are both scalable and adaptive, in the sense that they can be trained to obtain better performance in a particular domain. An experimental evaluation on a number of sample datasets shows that the adaptive method sometimes performs much better than either of two nonadaptive baseline systems, and is nearly always competitive with the best baseline system.
Constructing informative priors using transfer learning
 In Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorit ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent. The algorithm uses other “similar ” learning problems to estimate the covariance of pairs of individual parameters. We then use a semidefinite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior. 1.
A comparison of numerical optimizers for logistic regression
, 2003
"... Logistic regression is a workhorse of statistics and is closely related to methods used in Machine Learning, including the Perceptron and the Support Vector Machine. This note compares eight different algorithms for computing the maximum aposteriori parameter estimate. A full derivation of each alg ..."
Abstract

Cited by 85 (0 self)
 Add to MetaCart
Logistic regression is a workhorse of statistics and is closely related to methods used in Machine Learning, including the Perceptron and the Support Vector Machine. This note compares eight different algorithms for computing the maximum aposteriori parameter estimate. A full derivation of each algorithm is given. In particular, a new derivation of Iterative Scaling is given which applies more generally than the conventional one. A new derivation is also given for the Modified Iterative Scaling algorithm of Collins et al. (2002). Most of the algorithms operate in the primal space, but can also work in dual space. All algorithms are compared in terms of computational complexity by experiments on large data sets. The fastest algorithms turn out to be conjugate gradient ascent and quasiNewton algorithms, which far outstrip Iterative Scaling and its variants. 1