Results 1 - 10
of
26
Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine
- In Human Language Technology Conference (HLT
, 2002
"... INTRODUCTION Ever since the widespread availability of the Penn Treebank [9], there have been numerous, statistical parsers developed for English, e.g. [8, 5, 3]. To varying degrees, these parsers and others---while very successful at the tasks for which they were designed---had the following limit ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
INTRODUCTION Ever since the widespread availability of the Penn Treebank [9], there have been numerous, statistical parsers developed for English, e.g. [8, 5, 3]. To varying degrees, these parsers and others---while very successful at the tasks for which they were designed---had the following limitations: . they had a fairly fixed probabilistic structure, which could only be changed by re-coding some significant portion of the program . they had hard-coded features specific to English . they had hard-coded features specific to the Penn Treebank . they were designed only for a uniprocessor environment Building on our work in [1] and [2], we have developed a design for a head-driven, chart parsing engine that addresses all of the above limitations, and we present this design here. In particular, our design provides . appropriate layers of abstraction and encapsulation for quickly porting to di#erent languages and/or Treebank annotation styles, . has "plug-'n'-play" probabil
Word-Level Alignment For Multilingual Resource Acquisition
- In Proceedings of the Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data
, 2002
"... We present a simple, one-pass word alignment algorithm for parallel text. Our algorithm utilizes synchronous parsing and takes advantage of existing syntactic annotations. In our experiments the performance of this model is comparable to more complicated iterative methods. We discuss the challenges ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
We present a simple, one-pass word alignment algorithm for parallel text. Our algorithm utilizes synchronous parsing and takes advantage of existing syntactic annotations. In our experiments the performance of this model is comparable to more complicated iterative methods. We discuss the challenges and potential benefits of using this model to train syntactic parsers for new languages. 1
Multidimensional Transformation-Based Learning
- Conference on Natural Language Learning
, 2001
"... This paper presents a novel method that allows a machine learning algorithm following the transformation-based learning paradigm (Brill, 1995) to be applied to multiple classication tasks by training jointly and simultaneously on all elds. The motivation for constructing such a system stems from the ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper presents a novel method that allows a machine learning algorithm following the transformation-based learning paradigm (Brill, 1995) to be applied to multiple classication tasks by training jointly and simultaneously on all elds. The motivation for constructing such a system stems from the observation that many tasks in natural language processing are naturally composed of multiple subtasks which need to be resolved simultaneously; also tasks usually learned in isolation can possibly benet from being learned in a joint framework, as the signals for the extra tasks usually constitute inductive bias.
maximum entropy chinese character-based parser
- in Proceedings of the 2003 conference on Empirical methods in natural language processing - Volume 10
, 2003
"... being’s understanding of a sentence. Low agreement between humans affects directly evaluation of machines ’ performance (Wu and Fung, 1994) as it is hard to define a gold standard. It does not necessarily imply that machines cannot do better than humans. Indeed, if we train a model with consistently ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
being’s understanding of a sentence. Low agreement between humans affects directly evaluation of machines ’ performance (Wu and Fung, 1994) as it is hard to define a gold standard. It does not necessarily imply that machines cannot do better than humans. Indeed, if we train a model with consistently segmented data, a machine may do a better job in “remembering ” word segmentations. As will be shown shortly, it is straightforward to encode word-segmentation information in a character-
Automatically Extracting and Comparing Lexicalized Grammars for Different Languages
- In Proc. of the Seventeenth International Joint 30 / Data Oriented Parsing Conference on Arti Intelligence (IJCAI-2001
, 2001
"... In this paper, we present a quantitative comparison between the syntactic structures of three languages: English, Chinese and Korean. This is made possible by first extracting Lexicalized Tree Adjoining Grammars from annotated corpora for each language and then performing the comparison on the ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In this paper, we present a quantitative comparison between the syntactic structures of three languages: English, Chinese and Korean. This is made possible by first extracting Lexicalized Tree Adjoining Grammars from annotated corpora for each language and then performing the comparison on the extracted grammars. We found that the majority of the core grammar structures for these three languages are easily inter-mappable. 1
Facilitating Treebank Annotation Using a Statistical Parser
- IN PROCEEDINGS OF HLT 2001
, 2001
"... ..."
DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-level Alignment
- Proceedings of AMTA-02
, 2002
"... The frequent occurrence of divergences---structural differences between languages---presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to be ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
The frequent occurrence of divergences---structural differences between languages---presents a great challenge for statistical wordlevel alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate alignment and projection of dependency trees in another language without requiring any training on dependency-tree data in that language. We present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that our approach facilitates word-level alignment, particularly for sentence pairs containing divergences.
Breaking the Resource Bottleneck for Multilingual Parsing
- In Proceedings of LREC
, 2002
"... We propose a framework that enables the acquisition of annotation-heavy resources such as syntacfic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees ca ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We propose a framework that enables the acquisition of annotation-heavy resources such as syntacfic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees can be induced by using an English parser, a word alignment package, and a large corpus of sentence-aligned bilingual text. As a part of the experiment, we evaluate the quality of a Chinese parser trained on the induced dependency treebank. We find that a parser trained in this manner out-performs some simple baselines inspite of the noise in the induced treebank. The results suggest that projecting syntactic structures from English is a viable option for acquiring annotated syntactic structures quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and error-correction techniques are applied.
Evaluating Grammar Formalisms For Applications To Natural Language Processing And Biological Sequence Analysis
, 2004
"... EVALUATING GRAMMAR FORMALISMS FOR APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND BIOLOGICAL SEQUENCE ANALYSIS David Chiang Supervisor: Aravind K. Joshi Grammars are gaining importance in statistical natural language processing and computational biology as a means of encoding theories and struc ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
EVALUATING GRAMMAR FORMALISMS FOR APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND BIOLOGICAL SEQUENCE ANALYSIS David Chiang Supervisor: Aravind K. Joshi Grammars are gaining importance in statistical natural language processing and computational biology as a means of encoding theories and structuring algorithms. But one serious obstacle to applications of grammars is that formal language theory traditionally classifies grammars according to their weak generative capacity (WGC)---what sets of strings they generate---and tends to ignore strong generative capacity (SGC)---what sets of structural descriptions they generate---even though the latter is more relevant to applications.
The part-of-speech tagging guidelines for the Penn Chinese Treebank (3.0
- Linguistic Data Consortium
, 2000
"... 1.1 Tagging criteria....................................... 4 1.2 POS tagset......................................... 5 1.3 Size of the POS tagset................................... 6 ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
1.1 Tagging criteria....................................... 4 1.2 POS tagset......................................... 5 1.3 Size of the POS tagset................................... 6

