Results 1 - 10
of
81
Introduction to the special issue on word sense disambiguation
- Computational Linguistics J
, 1998
"... ..."
Part-of-Speech Tagging and Partial Parsing
- Corpus-Based Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but non-zero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of hand-constructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then hand-edited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
Computation of conditional probability statistics by 8-month-old infants
- PSYCHOLOGICAL SCIENCE
, 1998
"... A recent report demonstrated that 8-month-olds can segment a continuous stream of speech syllables, containing no acoustic or prosodic cues to word boundaries, into wordlike units after only 2 min of listening experience (Saffran, Aslin, & Newport, 1996). Thus, a powerful learning mechanism capabl ..."
Abstract
-
Cited by 62 (14 self)
- Add to MetaCart
A recent report demonstrated that 8-month-olds can segment a continuous stream of speech syllables, containing no acoustic or prosodic cues to word boundaries, into wordlike units after only 2 min of listening experience (Saffran, Aslin, & Newport, 1996). Thus, a powerful learning mechanism capable of extracting statistical information from fluent speech is available early in development. The present study extends these results by documenting the particular type of statistical computation—transitional (conditional) probability—used by infants to solve this word-segmentation task. An artificial language corpus, consisting of a continuous stream of trisyllabic nonsense words, was presented to 8-month-olds for 3 min. A postfamiliarization test compared the infants’ responses to words versus part-words (trisyllabic sequences spanning word boundaries). The corpus was constructed so that test words and part-words were matched in frequency, but differed in their transitional probabilities. Infants showed reliable
Knowledge-Free Induction of Morphology Using Latent Semantic Analysis
, 2000
"... Morphology induction is a subproblem of important tasks like automatic learning of machine-readable dictionaries and grammar induction. Previous morphology induction approaches have relied solely on statistics of hypothesized stems and affixes to choose which affixes to consider legitimate. Relying ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
Morphology induction is a subproblem of important tasks like automatic learning of machine-readable dictionaries and grammar induction. Previous morphology induction approaches have relied solely on statistics of hypothesized stems and affixes to choose which affixes to consider legitimate. Relying on stem-and-affix statistics rather than semantic knowledge leads to a number of problems, such as the inappropriate use of valid affixes ("ally" stemming to "all"). We introduce a semantic-based algorithm for learning morphology which only proposes affixes when the stem and stem-plus-affix are sufficiently similar semantically. We implement our approach using Latent Semantic Analysis and show that our semantics-only approach provides morphology induction results that rival a current state-of-the-art system.
Knowledge-free induction of inflectional morphologies
- IN PROCEEDINGS OF THE NORTH AMERICAN CHAPTER OF THE ACL
, 2001
"... We propose an algorithm to automatically induce the morphology of inflectional languages using only text corpora and no human input. Our algorithm combines cues from orthography, semantics, and syntactic distributions to induce morphological relationships in German, Dutch, and English. Using CELEX a ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
We propose an algorithm to automatically induce the morphology of inflectional languages using only text corpora and no human input. Our algorithm combines cues from orthography, semantics, and syntactic distributions to induce morphological relationships in German, Dutch, and English. Using CELEX as a gold standard for evaluation, we show our algorithm to be an improvement over any knowledge-free algorithm yet proposed.
Morphemes as Necessary Concept for Structures Discovery from Untagged Corpora
, 1998
"... This paper describes an overview of a method which allows discovery of syntactic structures from untagged corpora. It is composed of three main steps: the discovery of the grammatical morphcrees of the language. Then the con- struction of the chunks which are a mult'fiin- gual conceptual level allow ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This paper describes an overview of a method which allows discovery of syntactic structures from untagged corpora. It is composed of three main steps: the discovery of the grammatical morphcrees of the language. Then the con- struction of the chunks which are a mult'fiin- gual conceptual level allowing the bypass of the limping notion of words. And Finally the discovery of the relations between chunks. We give an overview of the different procedures realized and we especially describe the discow cry of morphcrees. This operation is divided into three steps: the discovery of the most frequent morphcrees of the language. Then the discovery of the other morphcrees, and finally the segmentation of the words of the corpus. We concluded with the procedure of correction which required the chunk level. The concepts and algorithms were tested on a twenty natural languages like English, German, Turkish, Vietnamese, Swabill, Finnish, Latin, Indone- sian.
Category Structures
- COMPUTATIONAL LINGUISTICS
, 1988
"... This paper outlines a simple and general notion of syntactic category on a metatheoretical level, independent of the notations and substantive claims of any particular grammatical framework. We define a class of formal objects called "category structures" where each such object provides a constructi ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
This paper outlines a simple and general notion of syntactic category on a metatheoretical level, independent of the notations and substantive claims of any particular grammatical framework. We define a class of formal objects called "category structures" where each such object provides a constructive definition for a space of syntactic categories. A unification operation and subsumption and identity relations are defined for arbitrary syntactic categories. In addition, a formal language for the statement of constraints on categories is provided. By combining a category structure with a set of constraints, we show that one can define the category systems of several well-known grammatical frameworks: phrase structure grammar, tagmemics, augmented phrase structure grammar, relational grammar, transformational grammar, generalized phrase structure grammar, systemic grammar, categorial grammar, and indexed grammar. The problem' of checking a category for conformity to constraints is shown to be soivable in linear time. This work provides in effect a unitary class of data structures for the representation of syntactic categories in a range of diverse grammatical frameworks. Using such data structures should make it possible for various pseudo-issues in natural language processing research to be avoided. We conclude by examining the questions posed by set-valued features and sharing of values between distinct feature specifications, both of which fall outside the scope of the formal system developed in this paper
ABL: Alignment-Based Learning
, 2000
"... This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis- tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis- tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt,ellCeS l,ha[, have oBe o1: illore words in (:ommon. When t, wo sentences are (tivi(led int,o t)arLs i;haL m'e Lhc same in 1)ol, h s(mLen(:es and t)arLs that m:e (litlrenL, this interreal,ion is used to find ])m'Ls l, haL are hd;cr(:hmgeablc. These t)arLs m'e tak(m as possible (:onsLii, uenLs same type. Afi,er this aligmnent learning step, the sele(:tion learning s(,c 1) s(l(z(:l,s i,he mosL at)le (:onsl;ihmnl;s fi'om all possible (:onsLiLuent,s. This method was used 1,o booLsLra t) stru(:hrc on the A.TIS (:oftres (Mm'(:us et, al., 1993) and on the OVI'S 1 corpus (Bornmina eL al., 1997). While Lhc results are en(:om'aging (we o})l, aincd up t,o 89.25 % non-crossing l)ra(:ket,s 1)rc(:ision), this paper will 1)oini; ouL some of the shorl,COlnings of our apl)rom:h and will suggest 1)ossible sohd,ions.

