Results 1 -
2 of
2
Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew
- Computational Linguistics
, 1995
"... This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and non-trivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes th ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and non-trivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes the use of these morpho-lexical probabilities as an information source for morphological disambiguation in Hebrew. The suggested method depends primarily on the following property: a lexical entry in Hebrew may have many different word forms, some of which are ambiguous while the others are not. Thus, the disambiguation of a given word can be achieved using other word forms of the same lexical entry. Even though it was originally devised and implemented for dealing with the morphological ambiguity problem in Hebrew, the basic idea can be extended and used to handle similar problems in other languages with rich morphology.
Morphological Disambiguation for Hebrew Search Systems
- In Proceeding of NGITS-99
, 1999
"... . In this work we describe a new approach for morphological disambiguation to enable linguistic indexing for Hebrew search systems. We describe a Hebrew Morphological Disambiguator (HMD or Hemed for short) based on statistical data gathered from large Hebrew corpora. We show how to integrate HMD ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
. In this work we describe a new approach for morphological disambiguation to enable linguistic indexing for Hebrew search systems. We describe a Hebrew Morphological Disambiguator (HMD or Hemed for short) based on statistical data gathered from large Hebrew corpora. We show how to integrate HMD with a search engine to enable linguistic search for Hebrew. We report some experimental results demonstrating the the superiority of linguistic search over string-matching search, and the contribution of morphological disambiguation to the quality of search result. 1 Background and Motivation With the advent of the Web, more and more textual information is being made available on line, and Information Retrieval (IR) systems are becoming of crucial importance to search through the vast amount of information. Most state-ofthe -art IR systems operate on a canonical representation of documents called a profile that consists of a list (or a vector in the commonly used vector space model [...

