Results 1 - 10
of
89
Functional morphology
- Proceedings of the Ninth ACM SIGPLAN International Conference of Functional Programming, Snowbird
, 2004
"... This paper presents a methodology for implementing natural language morphology in the functional language Haskell. The main idea behind is simple: instead of working with untyped regular expressions, which is the state of the art of morphology in computational linguistics, we use finite functions ov ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
This paper presents a methodology for implementing natural language morphology in the functional language Haskell. The main idea behind is simple: instead of working with untyped regular expressions, which is the state of the art of morphology in computational linguistics, we use finite functions over hereditarily finite algebraic datatypes. The definitions of these datatypes and functions are the language-dependent part of the morphology. The languageindependent part consists of an untyped dictionary format which is used for synthesis of word forms, and a decorated trie, which is used for analysis. Functional Morphology builds on ideas introduced by Huet in his computational linguistics toolkit Zen, which he has used to implement the morphology of Sanskrit. The goal has been to make it easy for linguists, who are not trained as functional programmers, to apply the ideas to new languages. As a proof of the productivity of the
Context-based morphological disambiguation with random fields
- In Proc. of HLT-EMNLP
, 2005
"... Finite-state approaches have been highly successful at describing the morphological processes of many languages. Such approaches have largely focused on modeling the phone- or character-level processes that generate candidate lexical types, rather than tokens in context. For the full analysis of wor ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Finite-state approaches have been highly successful at describing the morphological processes of many languages. Such approaches have largely focused on modeling the phone- or character-level processes that generate candidate lexical types, rather than tokens in context. For the full analysis of words in context, disambiguation is also required (Hakkani-Tür et al., 2000; Hajič et al., 2001). In this paper, we apply a novel source-channel model to the problem of morphological disambiguation (segmentation into morphemes, lemmatization, and POS tagging) for concatenative, templatic, and inflectional languages. The channel model exploits an existing morphological dictionary, constraining each word’s analysis to be linguistically valid. The source model is a factored, conditionally-estimated random field (Lafferty et al., 2001) that learns to disambiguate the full sentence by modeling local contexts. Compared with baseline state-of-the-art methods, our method achieves statistically significant error rate reductions on Korean, Arabic, and Czech, for various training set sizes and accuracy measures. 1
Joint Morphological and Syntactic Disambiguation
, 2007
"... In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In morphologically rich languages, should morphological and syntactic disambiguation be treated sequentially or as a single problem? We describe several efficient, probabilistically interpretable ways to apply joint inference to morphological and syntactic disambiguation using lattice parsing. Joint inference is shown to compare favorably to pipeline parsing methods across a variety of component models. State-of-the-art performance on Hebrew Treebank parsing is demonstrated using the new method. The benefits of joint inference are modest with the current component models, but appear to increase as components themselves improve.
Semitic morphological analysis and generation using finite state transducers with feature structures
- Conference of the European Chapter of the Association for Computational Linguistics, 12
, 2009
"... This paper presents an application of finite state transducers weighted with feature structure descriptions, following Amtrup (2003), to the morphology of the Semitic language Tigrinya. It is shown that feature-structure weights provide an efficient way of handling the templatic morphology that char ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
This paper presents an application of finite state transducers weighted with feature structure descriptions, following Amtrup (2003), to the morphology of the Semitic language Tigrinya. It is shown that feature-structure weights provide an efficient way of handling the templatic morphology that characterizes Semitic verb stems as well as the long-distance dependencies characterizing the complex Tigrinya verb morphotactics. A relatively complete computational implementation of Tigrinya verb morphology is described. 1
Feature-Based Tagger of Approximations of Functional Arabic Morphology
- In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005
, 2005
"... ..."
Integrating finite-state technology with deep LFG grammars
- In Proceedings of the ESSLLI’04 Workshop on Combining Shallow and Deep Processing for NLP
, 2004
"... Researchers at PARC were pioneers in developing finite-state methods for applications in computational linguistics, and one of the original motivations was to provide a coherent architecture for the integration of lower-level lexical processing with higher-level syntactic analysis (Kaplan and Kay, 1 ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Researchers at PARC were pioneers in developing finite-state methods for applications in computational linguistics, and one of the original motivations was to provide a coherent architecture for the integration of lower-level lexical processing with higher-level syntactic analysis (Kaplan and Kay, 1981;
The GF Resource grammar library
- August
, 2002
"... The GF Resource Grammar Library is a set of natural language grammars implemented in GF (Grammatical Framework). These grammars are in a strong sense parallel: they are built upon a common abstract syntax, i.e. a common tree structure. Individual languages are obtained via compositional mappings fro ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The GF Resource Grammar Library is a set of natural language grammars implemented in GF (Grammatical Framework). These grammars are in a strong sense parallel: they are built upon a common abstract syntax, i.e. a common tree structure. Individual languages are obtained via compositional mappings from abstract syntax trees to feature structures specific to each language. The grammar defines, for each language, a complete set of morphological paradigms and a syntax fragment comparable to CLE (Core Language Engine). It is available as open-source software under the GNU LGPL License.
Computing with realizational morphology
- Computational Linguistics and Intelligent Text Processing
, 2003
"... Abstract. The theory of realizational morphology presented by Stump in his influential book Inflectional Morphology (2001) describes the derivation of inflected surface forms from underlying lexical forms by means of ordered blocks of realization rules. The theory presents a rich formalism for expre ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract. The theory of realizational morphology presented by Stump in his influential book Inflectional Morphology (2001) describes the derivation of inflected surface forms from underlying lexical forms by means of ordered blocks of realization rules. The theory presents a rich formalism for expressing generalizations about phenomena commonly found in the morphological systems of natural languages. This paper demonstrates that, in spite of the apparent complexity of Stump’s formalism, the system as a whole is no more powerful than a collection of regular relations. Consequently, a Stump-style description of the morphology of a particular language such as Lingala or Bulgarian can be compiled into a finite-state transducer that maps the underlying lexical representations directly into the corresponding surface forms or forms, and vice versa, yielding a single lexical transducer. For illustration we will present an explicit finite-state implementation of an analysis of Lingala based on Stump’s description and other sources. 1
STRUCTURES AND DISTRIBUTIONS IN MORPHOLOGY LEARNING
, 2008
"... One of the great challenges in linguistics and cognitive science is to understand the nature of the mental representation of language. The precise mechanisms of the mind are unknown, but can be modeled through observation and experimentation. By viewing the mind as a computational device that receiv ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
One of the great challenges in linguistics and cognitive science is to understand the nature of the mental representation of language. The precise mechanisms of the mind are unknown, but can be modeled through observation and experimentation. By viewing the mind as a computational device that receives input (primary linguistic data) and produces output (the development of grammatical speech) during language acquisition, one can reason about what representations and algorithms must be internal to the learner. In this thesis, I investigate the acquisition of morphology. The principal challenges are how to learn a theory in the presence of sparse data, and in a manner that can provide explanations for the developmental processes in child language acquisition. The main idea underlying this work is that a consideration of the different aspects of language acquisition places strong constraints on cognitively plausible representations and algorithms that are internal to the learner. To develop a model of morphology acquisition, I pursue three lines of work: iv First, I formulate a cognitively-oriented computational framework for studying language acquisition that consists of four components: the linguistic representation, the

