Results 1 -
4 of
4
Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and em-hmm-based lexical probabilities
- In Proc. of EACL
, 2009
"... We present a framework for interfacing a PCFG parser with lexical information from an external resource following a different tagging scheme than the treebank. This is achieved by defining a stochastic mapping layer between the two resources. Lexical probabilities for rare events are estimated in a ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We present a framework for interfacing a PCFG parser with lexical information from an external resource following a different tagging scheme than the treebank. This is achieved by defining a stochastic mapping layer between the two resources. Lexical probabilities for rare events are estimated in a semi-supervised manner from a lexicon and large unannotated corpora. We show that this solution greatly enhances the performance of an unlexicalized Hebrew PCFG parser, resulting in state-of-the-art Hebrew parsing results both when a segmentation oracle is assumed, and in a real-word parsing scenario of parsing unsegmented tokens. 1
Using wikipedia links to construct word segmentation corpora
- In Proceedings of AAAI Workshops. Vasileios Hatzivassiloglou, Luis Gravano, and Ankineedu Maganti
, 2008
"... Tagged corpora are essential for evaluating and training natural language processing tools. The cost of constructing large enough manually tagged corpora is high, even when the annotation level is shallow. This article describes a simple method to automatically create a partially tagged corpus, usin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Tagged corpora are essential for evaluating and training natural language processing tools. The cost of constructing large enough manually tagged corpora is high, even when the annotation level is shallow. This article describes a simple method to automatically create a partially tagged corpus, using Wikipedia hyperlinks. The resulting corpus contains information about the correct segmentation of 523,599 non-consecutive words in 363,090 sentences. We used our method to construct a corpus of Modern Hebrew (which we have made available at
Tagging a Hebrew Corpus: The Case of Participles
"... We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on 4 aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We report on an effort to build a corpus of Modern Hebrew tagged with parts of speech and morphology. We designed a tagset specific to Hebrew while focusing on 4 aspects: the tagset should be consistent with common linguistic knowledge; there should be maximal agreement among taggers as to the tags assigned to maintain consistency; the tagset should be useful for machine taggers and learning algorithms; and the tagset should be effective for applications relying on the tags as input features. In this paper, we illustrate these issues by explaining our decision to introduce a tag for participles in Hebrew. We explain how this tag is defined, and how it helped us improve the manual tagging accuracy to a high-level, while improving automatic tagging and helping in the task of syntactic chunking. 1
Hebrew Morphological Tagging Guidelines BGU Computational Linguistics Group
, 2008
"... 1.1.1 Manual Tagging.......................... 7 ..."

