Results 1 - 10
of
12
Tagging English Text with a Probabilistic Model
, 1994
"... In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used ..."
Abstract
-
Cited by 212 (0 self)
- Add to MetaCart
In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined: using text that has been tagged by hand and computing relative frequency counts, using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle
Introduction to the special issue on word sense disambiguation
- Computational Linguistics J
, 1998
"... ..."
Introduction to the Special Issue on Computational Linguistics using Large Corpora
- Computational Linguistics
, 1993
"... ..."
Word sense disambiguation: The state of the art
- Computational Linguistics
, 1998
"... The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or ano ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950's. Sense disambiguation is an “intermediate task ” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is
Analysis, statistical transfer, and synthesis in machine translation
- In Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation
, 1992
"... We reinterpret the system described by Brown et al. [1] in terms of the analysis-transfer-synthesis paradigm common in machine translation. We describe enhanced analysis and syn-thesis components that apply a number of simple linguistic transformations so the transfer component operates from a strin ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
We reinterpret the system described by Brown et al. [1] in terms of the analysis-transfer-synthesis paradigm common in machine translation. We describe enhanced analysis and syn-thesis components that apply a number of simple linguistic transformations so the transfer component operates from a string of French morphemes to a string of English morphemes. We report the results of a comparison of the new system with the old system on 100 short test sentences. The new system correctly translates 60 % of these sentences while the old system correctly translates only 39 % of them. 1
Language-independent Induction of Part of Speech Class Labels Using Only Language Universals
- In Proc. IJCAI-2001 Workshop ‘Text Learning: Beyond Supervision
, 2001
"... We introduce a language-independent strategy for inducing part of speech tags from corpora. Unlike other techniques that use language-specific lexicons, rulesets, and so forth to tag, our algorithm bootstraps only from cluster properties and language universals. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We introduce a language-independent strategy for inducing part of speech tags from corpora. Unlike other techniques that use language-specific lexicons, rulesets, and so forth to tag, our algorithm bootstraps only from cluster properties and language universals.
A Stochastic Model Of Intonation For Text-To-Speech Synthesis
- Proceedings Eurospeech '97 (Rhodes
, 1998
"... This paper presents a stochastic model of intonation contours for use in text-to-speech synthesis. The model has two modules, a linguistic module that generates abstract prosodic labels from text, and a phonetic module that generates an F 0 curve from the abstract prosodic labels. This model differs ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper presents a stochastic model of intonation contours for use in text-to-speech synthesis. The model has two modules, a linguistic module that generates abstract prosodic labels from text, and a phonetic module that generates an F 0 curve from the abstract prosodic labels. This model differs from previous work in the abstract prosodic labels used, which can be automatically derived from the training corpus. This feature makes it possible to use large 1 This paper is based on a communication presented at Eurospeech'97 (Vronis et al. 1997) and has been recommended by the Editorial Board of Speech Communication. 2 corpora or several corpora of different speech styles, in addition to making it easy to adapt to new languages. The present paper focuses on the linguistic module, which does not require full syntactic analysis of the text but simply relies on part-of-speech tagging. The results were validated on French by means of a perception test. Listeners did not perceive a signif...
Finite State Transducers Approximating Hidden Markov Models
, 1997
"... This paper describes the conversion of a ..."
Tagging Urdu Text with Parts of Speech: A Tagger Comparison
"... In this paper, four state-of-art probabilistic taggers i.e. TnT tagger, TreeTagger, RF tagger and SVM tool, are applied to the Urdu language. For the purpose of the experiment, a syntactic tagset is proposed. A training corpus of 100,000 tokens is used to train the models. Using the lexicon extracte ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, four state-of-art probabilistic taggers i.e. TnT tagger, TreeTagger, RF tagger and SVM tool, are applied to the Urdu language. For the purpose of the experiment, a syntactic tagset is proposed. A training corpus of 100,000 tokens is used to train the models. Using the lexicon extracted from the training corpus, SVM tool shows the best accuracy of 94.15%. After providing a separate lexicon of 70,568 types, SVM tool again shows the best accuracy of 95.66%. 1 Urdu Language Urdu belongs to the Indo-Aryan language family. It is the national language of Pakistan and is one of the official languages of India. The majority of the speakers of Urdu spread over the area of South Asia, South Africa and the United Kingdom 1. Urdu is a free order language with general word order SOV. It shares its phonological, morphological and syntactic structures with Hindi. Some linguists considered them as two different dialects of one language (Bhatia and Koul, 2000). However, Urdu is written in Perso-arabic script and inherits most of the vocabulary from Arabic and Persian. On the other hand, Hindi is written in Devanagari script and inherits vocabulary from Sanskrit. Urdu is a morphologically rich language. Forms of the verb, as well as case, gender, and number are expressed by the morphology. Urdu represents case with a separate character after the head noun of the noun phrase. Due to their separate occurrence and their place of occurrence, they are sometimes considered as postpositions. Considering them as case markers, Urdu has no-minative, ergative, accusative, dative, instrumental, genitive and locative cases (Butt, 1995: pg 10). The Urdu verb phrase contains a main verb, a light verb describing the aspect, and a tense verb describing the tense of the phrase (Hardie,
Look-Back and Look-Ahead in the Conversion of Hidden Markov Models into Finite State Transducers
, 1998
"... This paper describes the conversion of a Hidden Markov Model into a finite state transducer that closely approximates the behavior of the stochastic model. In some cases the transducer is equivalent to the HMM. This conversion is especially advantageous for partof -speech tagging because the resulti ..."
Abstract
- Add to MetaCart
This paper describes the conversion of a Hidden Markov Model into a finite state transducer that closely approximates the behavior of the stochastic model. In some cases the transducer is equivalent to the HMM. This conversion is especially advantageous for partof -speech tagging because the resulting transducer can be composed with other transducers that encode correction rules for the most frequent tagging errors. The speed of tagging is also improved. The described methods have been implemented and successfully tested. 1

