Results 1 - 10
of
19
CoNLL-X shared task on multilingual dependency parsing
- In Proc. of CoNLL
, 2006
"... Each year the Conference on Computational Natural Language Learning (CoNLL) 1 features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. ..."
Abstract
-
Cited by 161 (2 self)
- Add to MetaCart
Each year the Conference on Computational Natural Language Learning (CoNLL) 1 features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. In this paper, we describe how treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured. We also give an overview of the parsing approaches that participants took and the results that they achieved. Finally, we try to draw general conclusions about multi-lingual parsing: What makes a particular language, treebank or annotation scheme easier or harder to parse and which phenomena are challenging for any dependency parser? Acknowledgement Many thanks to Amit Dubey and Yuval Krymolowski, the other two organizers of the shared task, for discussions, converting treebanks, writing software and helping with the papers. 2
Japanese Dependency Analysis using Cascaded Chunking
, 2002
"... In this paper, we propose a new statistical Japanese dependency parser using a cascaded chunking model. Conventional Japanese statistical dependency parsers are mainly based on a probabilistic model, which is not always efficient or scalable. We propose a new method that is simple and efficient, sin ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
In this paper, we propose a new statistical Japanese dependency parser using a cascaded chunking model. Conventional Japanese statistical dependency parsers are mainly based on a probabilistic model, which is not always efficient or scalable. We propose a new method that is simple and efficient, since it parses a sentence deterministically only deciding whether the current segment modifies the segment on its immediate right hand side. Experiments using the Kyoto University Corpus show that the method outperforms previous systems as well as improves the parsing and training efficiency.
Fast Methods for Kernel-based Text Analysis
, 2003
"... Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. Th ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernel-based text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to large-scale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernel-based classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernel-based classifiers.
Japanese Dependency Structure Analysis Based on Support Vector Machines
"... This paper presents a method of Japanese dependency structure analysis based on Support Vector Machines (SVMs). Conventional paxsing techniques based on Machine Leaxning framework, such as Decision Trees and Maximtun Entropy Models, have difficulty in selecting useful features as well as finding app ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
This paper presents a method of Japanese dependency structure analysis based on Support Vector Machines (SVMs). Conventional paxsing techniques based on Machine Leaxning framework, such as Decision Trees and Maximtun Entropy Models, have difficulty in selecting useful features as well as finding appropriate combination of selected features. On the other hand, it is well-known that SVMs achieve high generalization performance even with input data of very high dimensional feature space. lurthermore, by introducing the Kernel principle, SVMs can caxry out the training in high-dimensional .spaces with a smaller computational cost independent of their dimensionality. We apply SVMs to Japanese dependency structure identiffcation problem. Experimental results on Kyoto University corpus show that our system achieves the accuracy of 89.09% even with small training data (7958 sentences).
Japanese Dependency Structure Analysis Based on Maximum Entropy Models
, 1999
"... This paper describes a dependency structure analysis of Japanese sentences based on the maximum entropy models. ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper describes a dependency structure analysis of Japanese sentences based on the maximum entropy models.
A universal part-of-speech tagset
- IN ARXIV:1104.2086
, 2011
"... To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.
A maximum entropy tagger with unsupervised hidden Markov models
- In Proc. of the 6th NLPRS
, 2001
"... We describe a new tagging model where the states of a hidden Markov model (HMM) estimated by unsupervised learning are incorporated as the features in a maximum entropy model. Our method for exploiting unsupervised learning of a probabilistic model can reduce the cost of building taggers with no dic ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We describe a new tagging model where the states of a hidden Markov model (HMM) estimated by unsupervised learning are incorporated as the features in a maximum entropy model. Our method for exploiting unsupervised learning of a probabilistic model can reduce the cost of building taggers with no dictionary and a small annotated corpus. Experimental results on English POS tagging and Japanese word segmentation show that in both tasks our method greatly improves the tagging accuracy when the model is trained with a small annotated corpus. Furthermore, our English POS tagger achieved betterthan-state-of-the-art POS tagging accuracy (96.84%) when a large annotated corpus is available. 1
Learning to Predict Case Markers in Japanese
- In ACL-COLING
, 2006
"... Japanese case markers, which indicate the grammatical relation of the complement NP to the predicate, often pose challenges to the generation of Japanese text, be it done by a foreign language learner, or by a machine translation (MT) system. In this paper, we describe the task of predicting Japanes ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Japanese case markers, which indicate the grammatical relation of the complement NP to the predicate, often pose challenges to the generation of Japanese text, be it done by a foreign language learner, or by a machine translation (MT) system. In this paper, we describe the task of predicting Japanese case markers and propose machine learning methods for solving it in two settings: (i) monolingual, when given information only from the Japanese sentence; and (ii) bilingual, when also given information from a corresponding English source sentence in an MT context. We formulate the task after the well-studied task of English semantic role labelling, and explore features from a syntactic dependency structure of the sentence. For the monolingual task, we evaluated our models on the Kyoto Corpus and achieved over 84 % accuracy in assigning correct case markers for each phrase. For the bilingual task, we achieved an accuracy of 92 % per phrase using a bilingual dataset from a technical domain. We show that in both settings, features that exploit dependency information, whether derived from gold-standard annotations or automatically assigned, contribute significantly to the prediction of case markers. 1
A Deterministic Dependency Parser for Japanese
- In Proceedings of the MT Summit VII
, 1999
"... We present a rule-based, deterministic dependency parser for Japanese. It was implemented in C ++ , using object classes that reflect linguistic concepts and thus facilitate the transfer of linguistic intuitions into code. The parser first chunks morphemes into one-word phrases and then pars ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a rule-based, deterministic dependency parser for Japanese. It was implemented in C ++ , using object classes that reflect linguistic concepts and thus facilitate the transfer of linguistic intuitions into code. The parser first chunks morphemes into one-word phrases and then parses from the right to the left. The average parsing accuracy is 83.6%.
A Fast Japanese Sentence Analyzer
- In Proceedings of the First International Workshop on MultiMedia Annotation
, 2001
"... A deterministic nite state transducer is a fast device for analyzing strings. It takes O(n) time to analyze a string of length n. In this paper, an application of this technique to Japanese sentence analysis will be described. The Japanese analysis includes a morphological analyzer (Keitaiso-Kaiseki ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A deterministic nite state transducer is a fast device for analyzing strings. It takes O(n) time to analyze a string of length n. In this paper, an application of this technique to Japanese sentence analysis will be described. The Japanese analysis includes a morphological analyzer (Keitaiso-Kaiseki), a bunsetsu analyzer and a dependency analyzer (Kakariuke-Kaiseki). We achieved the speed at a small cost in accuracy. The morphological analysis was implemented based on the Saichou-Icchi-Hou (longest matching method), a traditional method in the morphological analysis, extended by registering compound words as one words. The bunsetsu analyzer is a simple N-gram method, although we noticed some improvement can be seen by introducing lexical information. The dependency analysis is the crucial part, as it normally takes long time to analyze, normally cubic to the sentence length. However our system takes about 0.17 millisecond to analyze one sentence (average length is 10 bunsetsu, based on PentiumIII 650MHz PC, Linux) and we actually observed the analysis time to be proportional to the sentence length. The accuracy is about 81% even though very little lexical information is used. This is about 17% and 9% better than the default and a simple system, respectively. We believe the gap between our performance and the best current performance on the same task, about 7%, can be lled by introducing lexical or semantic information. 1

