• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Spoken

Cached

  • Download as a PDF

Download Links

  • [www.lsv.uni-saarland.de]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Grzegorz Chrupała
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Chrupała_spoken,
    author = {Grzegorz Chrupała},
    title = {Spoken},
    year = {}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Word classes automatically induced from distributional evidence have proved useful many NLP tasks including Named Entity Recognition, parsing and sentence retrieval. The Brown hard clustering algorithm is commonly used in this scenario. Here we propose to use Latent Dirichlet Allocation in order to induce soft, probabilistic word classes. We compare our approach against Brown in terms of efficiency. We also compare the usefulness of the induced Brown and LDA word classes for the semi-supervised learning of three NLP tasks: fine-grained Named Entity Recognition, Morphological Analysis and semantic Relation Classification. We show that using LDA for word class induction scales better with the number of classes than the Brown algorithm and the resulting classes outperform Brown on the three tasks. 1

Citations

2168 Indexing by latent semantic analysis - Deerwester, Dumais, et al. - 1990
575 CYC: a large-scale investment in knowledge infrastructure - Lenat - 1995
540 Class-based n-gram models of natural language - Brown, Pietra, et al. - 1992
372 Finding scientific topics - Griffiths, Steyvers - 2004
340 BDiscriminative training methods for hidden Markov models - Collins
206 Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines - Platt - 1998
177 Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu - McCallum - 2002
175 TheWeka Data Mining Software: An Update - Hall, Frank, et al. - 2009
158 Europarl: A parallel corpus for statistical machine translation - Koehn - 2005
89 Integrating topics and syntax - Griffiths, Steyvers, et al. - 2005
84 A fully bayesian approach to unsupervised part-of-speech tagging - Goldwater, Griffiths - 2007
75 Distributional Part-of-Speech Tagging - Schütze - 1994
52 2008. A unified architecture for natural language processing: deep neural networks with multitask learning - Collobert, Weston
39 Name tagging with word clusters and discriminative training - Miller, Guinness, et al. - 2004
26 A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. EMNLP ’08 - Gao, Johnson - 2008
26 A Bayesian LDA-based model for semi-supervised part-ofspeech tagging - Toutanova, Johnson - 2007
23 Building a treebank for French - Abeillé, Clément, et al. - 2000
23 Design challenges and misconceptions in named entity recognition - Ratinov, Roth - 2009
21 BBN pronoun coreference and entity type corpus. Linguistic Data Consortium - Weischedel, Brunstein - 2005
16 Bayesian Word Sense Induction - Brody, Lapata - 2009
16 A scalable hierarchical distributed language model - Mnih, Hinton - 2008
14 Phrase clustering for discriminative learning - Lin, Wu - 2009
10 Improving generative statistical parsing with semi-supervised word clustering - Candito, Crabbé - 2009
10 Semi-supervised learning for natural language - Liang - 2005
10 An empirical study of semi-supervised structured conditional models for dependency parsing - Suzuki, Isozaki, et al. - 2009
8 SVD and clustering for unsupervised POS tagging - Lamar, Maron, et al. - 2010
8 representations: A simple and general method for semi-supervised learning - Word
7 A word clustering approach for language model-based sentence retrieval in question answering systems - Momtazi, Klakow - 2009
4 Measuring distributional similarity in context - Dinu, Lapata - 2010
2 A Named Entity Labeler for German: exploiting Wikipedia and distributional clusters - Chrupała, Klakow - 2010
2 UTD: Classifying semantic relations by combining lexical and semantic resources - Rink, Harabagiu - 2010
2 Fbkirst: Semantic relation extraction using cyc - Tymoshenko, Giuliano - 2010
1 Latent dirichlet allocation. The Journal of - Blei, Ng, et al. - 2003
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University