• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Probabilistic Tagging with feature structures (1994)

by André Kempe
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

Supertagging: An Approach to Almost Parsing

by Srinivas Bangalore, Aravind K. Joshi - Computational Linguistics , 1999
"... this paper, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated wit ..."
Abstract - Cited by 109 (17 self) - Add to MetaCart
this paper, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (Supertags) that impose complex constraints in a local context. The supertags are designed such that only those elements on which the lexical item imposes constraints appear within a given supertag. Further, each lexical item is associated with as many supertags as the number of different syntactic contexts in which the lexical item can appear. This makes the number of different descriptions for each lexical item much larger, than when the descriptions are less complex; thus increasing the local ambiguity for a parser. But this local ambiguity can be resolved by using statistical distributions of supertag co-occurrences collected from a corpus of parses. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework. The supertags in LTAG combine both phrase structure information and dependency information in a single representation. Supertag disambiguation results in a representation that is effectively a parse (almost parse), and the parser needs `only' combine the individual supertags. This method of parsing can also be used to parse sentence fragments such as in spoken utterances where the disambiguated supertag sequence may not combine into a single structure. 1 Introduction In this paper, we present a robust parsing approach called supertagging that integrates the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. The idea underlying the approach is that the ...

Estimating Membership Functions in a Fuzzy Network Model for Part-of-Speech Tagging

by Jae-hoon Kim, Jungyun Seo, Gil Chang Kim
"... Part-of-Speech(POS) tagging is a process of assigning a POS to each word in a sentence. Since many words are often ambiguous in their POSs, POS tagging must be able to select the best POS sequence for a given sentence. Recently, probabilistic approaches have shown very promising results to solve suc ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Part-of-Speech(POS) tagging is a process of assigning a POS to each word in a sentence. Since many words are often ambiguous in their POSs, POS tagging must be able to select the best POS sequence for a given sentence. Recently, probabilistic approaches have shown very promising results to solve such ambiguity problems. Probabilistic approaches, however, usually require lots of training data to get reliable probabilities. To alleviate such restriction, we use fuzzy membership functions instead of probability distributions. Such a POS tagging model is called a fuzzy network POS tagging model. The membership functions are automatically estimated by using probabilities and neural networks with a learning algorithm. Experiments show that the performance of the fuzzy network POS tagging model is much better than that of a hidden Markov model under a limited amount of training data. Keywords : Fuzzy networks, membership function estimation, part-of-speech tagging 1 Introduction Words are...

2008. Estimation of conditional Probabilities with Decision Trees and an Application to Fine-Grained

by Helmut Schmid, Florian Laws - POS tagging, COLING 2008
"... We present a HMM part-of-speech tagging method which is particularly suited for POS tagsets with a large number of fine-grained tags. It is based on three ideas: (1) splitting of the POS tags into attribute vectors and decomposition of the contextual POS probabilities of the HMM into a product of at ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
We present a HMM part-of-speech tagging method which is particularly suited for POS tagsets with a large number of fine-grained tags. It is based on three ideas: (1) splitting of the POS tags into attribute vectors and decomposition of the contextual POS probabilities of the HMM into a product of attribute probabilities, (2) estimation of the contextual probabilities with decision trees, and (3) use of high-order HMMs. In experiments on German and Czech data, our tagger outperformed stateof-the-art POS taggers. 1

Comparative State-of-the-Art Survey and Assessment Study of . . .

by Bruno Maximilian Schulze , 1994
"... Contents 1 Introduction 1 1.1 Rationale : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Method of the survey : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.3 Structure of this document : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2 Token ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
Contents 1 Introduction 1 1.1 Rationale : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Method of the survey : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.3 Structure of this document : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2 Tokenization 5 2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Tokenizer Survey : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.2.1 Unix lex Tokenizer : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.2.1.1 Sample input/output: : : : : : : : : : : : : : : : : : : : : : 7 2.2.2 RXRC Finite-state tokenizer : : : : : : : : : : : : : : : : : : : : : : 7 2.2.2.1 Sample input/output: : : : : : : : : : : : : : : : : : : : : : 8 2.3 Tokeniz

A neural network approach to part-of-speech tagging

by Nuno C. Marques, Gabriel Pereira Lopes, Lisboa Faculdade, Ciências Tecnologia - In Proceedings of the Second Workshop on Computational Processing of Written and Spoken Portuguese , 1996
"... Neural networks are one of the most efficient techniques for learning from scarce data. This property is very useful when trying to build a part-of-speech tagger. Available part-of-speech taggers need huge amounts of hand tagged text, but for Portuguese there is no such corpora available. In this pa ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Neural networks are one of the most efficient techniques for learning from scarce data. This property is very useful when trying to build a part-of-speech tagger. Available part-of-speech taggers need huge amounts of hand tagged text, but for Portuguese there is no such corpora available. In this paper we propose a neural network that, apparently, is capable of overcoming the huge training corpus problem. Distinct network topologies are applied to the problem of learning the parameters of a part-of-speech tagger from a very small Portuguese training corpus and from a subset of the Susanne Corpus. The experiments carried out are discussed. The results obtained point to a correction rate above the 97% starting in with a hand tagged training corpus with approximately 15,000 words. 1.

Neural Networks, Part-of-Speech Tagging and Lexicons

by Nuno C. Marques, Gabriel Pereira Lopes
"... Neural networks are one of the most efficient techniques for learning from scarce data. This property is very useful when trying to build a part-of-speech tagger. Available part-of-speech taggers need huge amounts of hand tagged text, but for Portuguese as well as for many other languages there ar ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Neural networks are one of the most efficient techniques for learning from scarce data. This property is very useful when trying to build a part-of-speech tagger. Available part-of-speech taggers need huge amounts of hand tagged text, but for Portuguese as well as for many other languages there are no such hand tagged corpora available. In this paper we propose the cooperation of a lexical system and a neural network in such a way that the huge training corpus problem is overcome. The network topology we used was applied to the problem of learning the parameters of a part-of-speech tagger from a very small Portuguese training corpus and from a subset of the Susanne Corpus. The experiments carried out are discussed. The results obtained point to a correction rate above 97% when we start from a hand tagged training corpus with approximately 15,000 words. The application of our system to real texts is also described. 1. Introduction The application potential of textual corpora i...

A Contribution to the Question of Authenticity of Rhesus Using Part-of-Speech Tagging

by Bernd Ludwig
"... . This paper presents the results of an experiment to decide the question of authenticity of the supposedly spurious Rhesus---a attic tragedy sometimes credited to Euripides. The experiment involves the use of statistics in order to test whether significant deviations in the distribution of word cat ..."
Abstract - Add to MetaCart
. This paper presents the results of an experiment to decide the question of authenticity of the supposedly spurious Rhesus---a attic tragedy sometimes credited to Euripides. The experiment involves the use of statistics in order to test whether significant deviations in the distribution of word categories between Rhesus and the other works of Euripides can or cannot be found. To count frequencies of word categories in the corpus, a part-of-speech tagger for Greek has been implemented. Some special techniques for reducing the problem of sparse data are used resulting in an accuracy of ca. 96.6%. 1 Introduction 1.1 The Philological Problem In the tradition of ancient Greek texts it sometimes happens that, due to a number of different reasons a text is credited incorrectly to a certain author. It is the---sometimes very difficult---task of classical philology to detect these erroneous assignments and, if possible, to correct them. The methods used for this task and the results achieve...

A Spanish POS tagger with variable memory

by Trivifio-Rodriguez Morales-Bueno Dept, J. L. Morales-bueno
"... An implementation of a Spanish POS tagger is described in this paper. This implementation combines three basic approaches: a single word tagger based on decision trees, a POS tagger based on variable memory Markov models, and a feature structures set of tags. Using decision trees for single word t ..."
Abstract - Add to MetaCart
An implementation of a Spanish POS tagger is described in this paper. This implementation combines three basic approaches: a single word tagger based on decision trees, a POS tagger based on variable memory Markov models, and a feature structures set of tags. Using decision trees for single word tagging allows the tagger to work without a lexicon that lists only possible tags. Moreover, it decreases the error rate because there are no unknown words. The feature structure set of tags is advantageous when the available training corpus is small and the tag set large, which can be the case with morphologically rich languages like Spanish. Finally, variable memory Markov models training is more efficient than traditional full-order Markov models and achieves better accuracy. In this implementation, 98.58Y0 of tokens are correctly classified.

unknown title

by unknown authors , 1996
"... 1.1 The philological problem In the tradition of ancient Greek texts it sometimes happens that, due to a number of different reasons a text is credited incorrectly to a certain author. ..."
Abstract - Add to MetaCart
1.1 The philological problem In the tradition of ancient Greek texts it sometimes happens that, due to a number of different reasons a text is credited incorrectly to a certain author.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University