Results 11 - 20
of
51
Multilingual dependency parsing using bayes’ point machines
- In Proceedings HLT/NAACL
, 2006
"... We develop dependency parsers for Arabic, English, Chinese, and Czech using Bayes Point Machines, a training algorithm which is as easy to implement as the perceptron yet competitive with large margin methods. We achieve results comparable to state-of-the-art in English and Czech, and report the fir ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We develop dependency parsers for Arabic, English, Chinese, and Czech using Bayes Point Machines, a training algorithm which is as easy to implement as the perceptron yet competitive with large margin methods. We achieve results comparable to state-of-the-art in English and Czech, and report the first directed dependency parsing accuracies for Arabic and Chinese. Given the multilingual nature of our experiments, we discuss some issues regarding the comparison of dependency parsers for different languages. 1
Induction of the Morphology of Natural Language: Unsupervised Morpheme Segmentation with Application to Automatic Speech Recognition
, 2006
"... ISBN 951-22-8210-0 (printed version) ISBN 951-22-8211-9 (electronic version) ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
ISBN 951-22-8210-0 (printed version) ISBN 951-22-8211-9 (electronic version)
Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study
"... Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with different and incompatible annotation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with different and incompatible annotation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice to automatically adapt one annotation standard to another. We present a simple yet effective strategy that transfers knowledge from a differently annotated corpus to the corpus with desired annotation. We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. Experiments show that adaptation from the much larger People’s Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvements in both segmentation and tagging accuracies (with error reductions of 30.2 % and 14%, respectively), which in turn helps improve Chinese parsing accuracy. 1
Pronominal Anaphora Resolution in Chinese
, 2006
"... iii Acknowledgements First of all, my heartfelt gratitude to and deepest respect for the members of my committee: Aravind Joshi, Ellen Prince, Candy Sidner, and Mitch Marcus. Without their help this thesis would not exist. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
iii Acknowledgements First of all, my heartfelt gratitude to and deepest respect for the members of my committee: Aravind Joshi, Ellen Prince, Candy Sidner, and Mitch Marcus. Without their help this thesis would not exist.
Proposition Bank II: Delving Deeper
- HLT-NAACL 2004 Workshop: Frontiers in Corpus Annotation
, 2004
"... The PropBank project is creating a corpus of text annotated with information about basic semantic propositions. PropBank I (Kingsbury & Palmer, 2002) added a layer of predicateargument information, or semantic roles, to the syntactic structures of the English Penn Treebank. This paper presents an ov ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The PropBank project is creating a corpus of text annotated with information about basic semantic propositions. PropBank I (Kingsbury & Palmer, 2002) added a layer of predicateargument information, or semantic roles, to the syntactic structures of the English Penn Treebank. This paper presents an overview of the second phase of PropBank Annotation, PropBank II, which is being applied to English and Chinese, and includes (Neodavidsonian) eventuality variables, nominal references, sense tagging, and connections to the Penn Discourse Treebank (PDTB), a project for annotating discourse connectives and their arguments. 1
Covariance in Unsupervised Learning of Probabilistic Grammars
"... Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learn ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar’s parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, non-parallel data.
Forthcoming. Chinese Statistical Parsing
"... This chapter describes several issues that are fundamental to achieving accurate Chinese parsing given available Chinese resources and the challenges of the Gale processing pipeline. For Gale, our parsing algorithm is expected to accurately parse various different materials, ranging from newswire te ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This chapter describes several issues that are fundamental to achieving accurate Chinese parsing given available Chinese resources and the challenges of the Gale processing pipeline. For Gale, our parsing algorithm is expected to accurately parse various different materials, ranging from newswire text, which tends to be grammatically well formed, to n-best ASR outputs, many of which are poorly formed sentences. To address this challenge, we have re-implemented and enhanced the Berkeley parser to handle unknown Chinese words efficiently, parse difficult sentences robustly, and operate more efficiently. We also address issues related to training the parser for several different genres given a limited number of available training trees, the importance of matching word segmentation to the treebank segmentation standard to support accurate parsing, and the need for standardized tokenization for managing the types of things that will appear as input to the parser. Understanding and handling these issues is a prerequisite for achieving adequate parsing performance levels. We also investigate self-training with automatically labeled in-domain data to enhance parsing performance given the limited number of trees in the Chinese treebanks. 1
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
"... The large combined search space of joint word segmentation and Part-of-Speech (POS) tagging makes efficient decoding very hard. As a result, effective high order features representing rich contexts are inconvenient to use. In this work, we propose a novel stacked subword model for this task, concern ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The large combined search space of joint word segmentation and Part-of-Speech (POS) tagging makes efficient decoding very hard. As a result, effective high order features representing rich contexts are inconvenient to use. In this work, we propose a novel stacked subword model for this task, concerning both efficiency and effectiveness. Our solution is a two step process. First, one word-based segmenter, one character-based segmenter and one local character classifier are trained to produce coarse segmentation and POS information. Second, the outputs of the three predictors are merged into sub-word sequences, which are further bracketed and labeled with POS tags by a fine-grained sub-word tagger. The coarse-to-fine search scheme is efficient, while in the sub-word tagging step rich contextual features can be approximately derived. Evaluation on the Penn Chinese Treebank shows that our model yields improvements over the best system reported in the literature. 1
Joint Models for Chinese POS Tagging and Dependency Parsing
"... Part-of-speech (POS) is an indispensable feature in dependency parsing. Current research usually models POS tagging and dependency parsing independently. This may suffer from error propagation problem. Our experiments show that parsing accuracy drops by about 6 % when using automatic POS tags instea ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Part-of-speech (POS) is an indispensable feature in dependency parsing. Current research usually models POS tagging and dependency parsing independently. This may suffer from error propagation problem. Our experiments show that parsing accuracy drops by about 6 % when using automatic POS tags instead of gold ones. To solve this issue, this paper proposes a solution by jointly optimizing POS tagging and dependency parsing in a unique model. We design several joint models and their corresponding decoding algorithms to incorporate different feature sets. We further present an effective pruning strategy to reduce the search space of candidate POS tags, leading to significant improvement of parsing speed. Experimental results on Chinese Penn Treebank 5 show that our joint models significantly improve the state-of-the-art parsing accuracy by about 1.5%. Detailed analysis shows that the joint method is able to choose such POS tags that are more helpful and discriminative from parsing viewpoint. This is the fundamental reason of parsing accuracy improvement. 1
Why is German dependency parsing more reliable than constituent parsing
- In Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT
, 2006
"... In recent years, research in parsing has extended in several new directions. One of ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In recent years, research in parsing has extended in several new directions. One of

