Results 1 -
6 of
6
A practical part-of-speech tagger
- IN PROCEEDINGS OF THE THIRD CONFERENCE ON APPLIED NATURAL LANGUAGE PROCESSING
, 1992
"... We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and op ..."
Abstract
-
Cited by 325 (5 self)
- Add to MetaCart
We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in high-speed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.
Stylistic Experiments For Information Retrieval
, 2000
"... Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topi ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topic. The experiments described in this text investigate stylistic variation. Roughly put, style is the difference between two ways of saying the same thing -- and systematic stylistic variation can be used to characterize the genre of documents. These experiments investigate if stylistic information is distinguishable using simple language engineering methods, and if in that case this type of information can be used to improve information retrieval systems.
The specific-word frequency effect: Implications for the representation of homophones in speech production
"... A series of experiments investigated whether naming latencies for homophones (e.g., /nn/) are a function of specific-word frequency (i.e., the frequency of nun) or a function of cumulativehomophone frequency (i.e., the sum of the frequencies of nun and none). Specific-word but not cumulative-homopho ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A series of experiments investigated whether naming latencies for homophones (e.g., /nn/) are a function of specific-word frequency (i.e., the frequency of nun) or a function of cumulativehomophone frequency (i.e., the sum of the frequencies of nun and none). Specific-word but not cumulative-homophone frequency affected picture-naming latencies. This result was obtained in two languages (English and Chinese). An analogous finding was obtained in a translation task, where bilingual speakers produced the English names of visually presented Spanish words. Control experiments ruled out that these results are an artifact of orthographic or articulatory factors, or of visual recognition. The results argue against the hypothesis that homophones share a common word-form representation, and support instead a model in which homophones have fully independent representations. 3 Homophones are words that have the same pronunciation but differ in meaning, spelling, or grammatical class. How are homophones represented and accessed in speech production? Two hypotheses have been proposed. One view holds that homophones share a common lexicalphonological representation, but because they have different meanings and often also different grammatical properties (e.g., sun/son; the watch/to watch; him/hymn), they have different semantic and lexical-grammatical representations (Cutting & Ferreira, 1999; Dell, 1990; Jescheniak & Levelt, 1994; Levelt, Roelofs, & Meyer, 1999) 1 . We will call models of this type shared representation (SR) models. There are four levels of representation in these models: semantic/conceptual nodes, lemma nodes, lexeme nodes, and phonological nodes. Lemmas specify the words grammatical properties, while lexemes specify their phonological contents. Figure 1a sche...
An Unsupervised Method for Multilingual Word Sense Tagging Using Parallel Corpora: A Preliminary Investigation
"... With an increasing number of languages making their way to our desktops everyday via the Internet, researchers have come to realize the lack of linguistic knowledge resources for scarcely represented/studied languages. In an attempt to bootstrap some of the required linguistic resources for some of ..."
Abstract
- Add to MetaCart
With an increasing number of languages making their way to our desktops everyday via the Internet, researchers have come to realize the lack of linguistic knowledge resources for scarcely represented/studied languages. In an attempt to bootstrap some of the required linguistic resources for some of those languages, this paper presents an unsupervised method for automatic multilingual word sense tagging using parallel corpora. The method is evaluated on the English Brown corpus and its translation into three different languages: French, German and Spanish. A preliminary evaluation of the proposed method yielded results of up to 79 % accuracy rate for the English data on 81.8 % of the SemCor manually tagged data.
An Unsupervised Method for Multilingual Word Sense Tagging Using
"... With an increasing number of languages making their way to our desktops everyday via the Intemet, researchers have come to realize the lack of linguistic knowledge resources for scarcely represented/studied languages. In an attempt to bootstrap some of the required linguistic resources for som ..."
Abstract
- Add to MetaCart
With an increasing number of languages making their way to our desktops everyday via the Intemet, researchers have come to realize the lack of linguistic knowledge resources for scarcely represented/studied languages. In an attempt to bootstrap some of the required linguistic resources for some of those languages, this paper presents an unsupervised method for automatic multilingual word sense tagging using parallel corpora. The method is evaluated on the English Brown corpus and its translation into three different languages: French, German and Spanish. A preliminary evaluation of the proposed method yielded results of up to 79% accuracy rate for the English data on 81.8% of the SemCor manually tagged data.

