• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Gaussian Prior for Smoothing Maximum Entropy Models (1999)

by Stanley F. Chen, Ronald Rosenfeld
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 139
Next 10 →

Conditional Structure versus Conditional Estimation in NLP Models

by Dan Klein, Christopher D. Manning - In EMNLP 2002 , 2002
"... This paper separates conditional parameter estimation, which consistently raises test set accuracy on statistical NLP tasks, from conditional model structures, such as the conditional Markov model used for maximum-entropy tagging, which tend to lower accuracy. Error analysis on part-of-speech t ..."
Abstract - Cited by 43 (2 self) - Add to MetaCart
This paper separates conditional parameter estimation, which consistently raises test set accuracy on statistical NLP tasks, from conditional model structures, such as the conditional Markov model used for maximum-entropy tagging, which tend to lower accuracy. Error analysis on part-of-speech tagging shows that the actual tagging errors made by the conditionally structured model derive not only from label bias, but also from other ways in which the independence assumptions of the conditional model structure are unsuited to linguistic sequences. The paper presents new word-sense disambiguation and POS tagging experiments, and integrates apparently conflicting reports from other recent work.

Collective multilabel classification

by Nadia Ghamrawi, Andrew Mccallum - In CIKM , 2005
"... Common approaches to multi-label classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only well-suited to problems in which categories are independen ..."
Abstract - Cited by 43 (0 self) - Add to MetaCart
Common approaches to multi-label classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only well-suited to problems in which categories are independent. However, in many domains labels are highly interdependent. This paper explores multilabel conditional random field (CRF) classification models that directly parameterize label co-occurrences in multi-label classification. Experiments show that the models outperform their singlelabel counterparts on standard text corpora. Even when multilabels are sparse, the models improve subset classification error by as much as 40%.

Investigating GIS and smoothing for maximum entropy taggers

by James R. Curran, Stephen Clark - In Proceedings of the 10th Meeting of the EACL , 2003
"... This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the corre ..."
Abstract - Cited by 38 (8 self) - Add to MetaCart
This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the correctness of GIS, is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets: the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar. 1

Finding advertising keywords on web pages

by Wen-tau Yih - In Proceedings of WWW , 2006
"... A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe ..."
Abstract - Cited by 37 (2 self) - Add to MetaCart
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of features, such as term frequency of each

Probabilistic disambiguation models for wide-coverage HPSG parsing

by Yusuke Miyao - University of Michigan, Ann Arbor , 2005
"... This paper reports the development of loglinear models for the disambiguation in wide-coverage HPSG parsing. The estimation of log-linear models requires high computational cost, especially with widecoverage grammars. Using techniques to reduce the estimation cost, we trained the models using 20 sec ..."
Abstract - Cited by 34 (11 self) - Add to MetaCart
This paper reports the development of loglinear models for the disambiguation in wide-coverage HPSG parsing. The estimation of log-linear models requires high computational cost, especially with widecoverage grammars. Using techniques to reduce the estimation cost, we trained the models using 20 sections of Penn Treebank. A series of experiments empirically evaluated the estimation techniques, and also examined the performance of the disambiguation models on the parsing of real-world sentences.

Learning for Semantic Parsing with Statistical Machine Translation

by Yuk Wah Wong, Raymond J. Mooney , 2006
"... We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of- ..."
Abstract - Cited by 34 (8 self) - Add to MetaCart
We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of-the-art statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

Parameter Estimation for Probabilistic Finite-State Transducers

by Jason Eisner - Proc. of the Annual Meeting of the Association for Computational Linguistics , 2002
"... Weighted finite-state transducers suffer from the lack of a training algorithm. Training is even harder for transducers that have been assembled via finite-state operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. We formulate a "pa ..."
Abstract - Cited by 33 (3 self) - Add to MetaCart
Weighted finite-state transducers suffer from the lack of a training algorithm. Training is even harder for transducers that have been assembled via finite-state operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. We formulate a "parameterized FST" paradigm and give training algorithms for it, including a general bookkeeping trick ("expectation semirings") that cleanly and efficiently computes expectations and gradients.

Improving Trigram Language Modeling with The World Wide Web

by Xiaojin Zhu, Ronald Rosenfeld - Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP’01 , 2001
"... We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical language modeling. We submit an N-gram as a phrase query to web search engines. The search engines return the number of web pages containing the phrase, from which the N-gram count is estimated. The N ..."
Abstract - Cited by 28 (0 self) - Add to MetaCart
We propose a novel method for using the World Wide Web to acquire trigram estimates for statistical language modeling. We submit an N-gram as a phrase query to web search engines. The search engines return the number of web pages containing the phrase, from which the N-gram count is estimated. The N-gram counts are then used to form web-based trigram probability estimates. We discuss the properties of such estimates, and methods to interpolate them with traditional corpus based trigram estimates. We show that the interpolated models improve speech recognition word error rate significantly over a small test set. 1.

Maximum entropy models for FrameNet classification

by Michael Fleischman, Namhee Kwon, Eduard Hovy - In Proc. of the Conf. on Empirical Methods in Natural Language Processing , 2003
"... The development of FrameNet, a large database of semantically annotated sentences, has primed research into statistical methods for semantic tagging. We advance previous work by adopting a Maximum Entropy approach and by using previous tag information to find the highest probability tag sequence for ..."
Abstract - Cited by 27 (5 self) - Add to MetaCart
The development of FrameNet, a large database of semantically annotated sentences, has primed research into statistical methods for semantic tagging. We advance previous work by adopting a Maximum Entropy approach and by using previous tag information to find the highest probability tag sequence for a given sentence. Further we examine the use of sentence level syntactic pattern features to increase performance. We analyze our strategy on both human annotated and automatically identified frame elements, and compare performance to previous work on identical test data. Experiments indicate a statistically significant improvement (p<0.01) of over 6%. 1

Unifying Divergence Minimization and Statistical Inference via Convex Duality

by Yasemin Altun, Alex Smola - Proc. of Conf. on Learning Theory (COLT , 2006
"... Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence b ..."
Abstract - Cited by 26 (9 self) - Add to MetaCart
Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence bounds for many statistical learning problems. Finally, we show how an algorithm by Zhang can be used to solve this class of optimization problems efficiently. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University