Results 1 -
9 of
9
Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew
- Computational Linguistics
, 1995
"... This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and non-trivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes th ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and non-trivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes the use of these morpho-lexical probabilities as an information source for morphological disambiguation in Hebrew. The suggested method depends primarily on the following property: a lexical entry in Hebrew may have many different word forms, some of which are ambiguous while the others are not. Thus, the disambiguation of a given word can be achieved using other word forms of the same lexical entry. Even though it was originally devised and implemented for dealing with the morphological ambiguity problem in Hebrew, the basic idea can be extended and used to handle similar problems in other languages with rich morphology.
Hebrew computational linguistics: Past and future
- Artificial Intelligence Review
, 2004
"... This paper reviews the current state of the art in Natural Language Processing for Hebrew, both theoretical and practical. The Hebrew language, like other Semitic languages, poses special challenges for developers of programs for natural language processing: the writing system, rich morphology, uniq ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper reviews the current state of the art in Natural Language Processing for Hebrew, both theoretical and practical. The Hebrew language, like other Semitic languages, poses special challenges for developers of programs for natural language processing: the writing system, rich morphology, unique word formation process of roots and patterns, lack of linguistic corpora that document language usage, all contribute to making computational approaches to Hebrew challenging. The paper briefly reviews the field of computational linguistics and the problems it addresses, describes the special difficulties inherent to Hebrew (as well as to other Semitic languages), surveys a wide variety of past and ongoing works and attempts to characterize future needs and possible solutions. 1
Morphological Disambiguation for Hebrew Search Systems
- In Proceeding of NGITS-99
, 1999
"... . In this work we describe a new approach for morphological disambiguation to enable linguistic indexing for Hebrew search systems. We describe a Hebrew Morphological Disambiguator (HMD or Hemed for short) based on statistical data gathered from large Hebrew corpora. We show how to integrate HMD ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
. In this work we describe a new approach for morphological disambiguation to enable linguistic indexing for Hebrew search systems. We describe a Hebrew Morphological Disambiguator (HMD or Hemed for short) based on statistical data gathered from large Hebrew corpora. We show how to integrate HMD with a search engine to enable linguistic search for Hebrew. We report some experimental results demonstrating the the superiority of linguistic search over string-matching search, and the contribution of morphological disambiguation to the quality of search result. 1 Background and Motivation With the advent of the Web, more and more textual information is being made available on line, and Information Retrieval (IR) systems are becoming of crucial importance to search through the vast amount of information. Most state-ofthe -art IR systems operate on a canonical representation of documents called a profile that consists of a list (or a vector in the commonly used vector space model [...
A New Program For Hebrew Index Based On The Phonemic Script
- in Cohen W.W. and Hirsh H.(eds.), Proceedings of the 11th International Machine Learning Conference (ICML'94), Rutgers University
, 1995
"... Introduction 1 Preparing an index or concordance for an analytic language in which each word has only one semantic factor is a simple task easily automated in a computer system. However, most natural languages are rather synthetic, i.e., they have other words which contain more than one semantic f ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Introduction 1 Preparing an index or concordance for an analytic language in which each word has only one semantic factor is a simple task easily automated in a computer system. However, most natural languages are rather synthetic, i.e., they have other words which contain more than one semantic factor. The English word `DOGS' for example has two factors. We may refer to these words as complex. Usually one of the factors is lexical while the rest are grammatical. A linguist may use lists of `complex words' according to their grammatical factor. For example, according to the plural indication. He or she may even be interested in derivative words that have a common constituent, such as ject in subject, project or inject, where each word has an independent lexical entry. However, in most cases we would like to have lists of words, including `complex words', which are gathered in groups according to th
Syntactic Analysis of Hebrew Sentences
- In Proceedings of the 8th Israeli Symposium on Artificial Intelligence and Computer Vision. Information Processing Association of Israel
, 1995
"... Due to recent development in the area of computational formalisms for linguistic representation, the task of designing a parser for a specified natural language is now shifted to the problem of designing its grammar in certain formal ways. This paper describes the results of a project whose aim was ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Due to recent development in the area of computational formalisms for linguistic representation, the task of designing a parser for a specified natural language is now shifted to the problem of designing its grammar in certain formal ways. This paper describes the results of a project whose aim was to design a formal grammar for modern Hebrew. Such a formal grammar has never been developed before. Since most of the work on grammatical formalisms was done without regarding Hebrew (and other Semitic languages as well), we had to choose a formalism that would best fit the specific needs of the language. This part of the project has been described elsewhere. In this paper we describe the details of the grammar we developed. The grammar deals with simple, subordinate and coordinate sentences as well as interrogative sentences. Some structures were thoroughly dealt with, among which are noun phrases, verb phrases, adjectival phrases, relative clauses, object and adjunct clauses; many types o...
A Morphologically-Analyzed CHILDES Corpus of Hebrew
"... We present a corpus of transcribed spoken Hebrew that forms an integral part of a comprehensive data system that has been developed to suit the specific needs and interests of child language researchers: CHILDES (Child Language Data Exchange System). We introduce a dedicated transcription scheme for ..."
Abstract
- Add to MetaCart
We present a corpus of transcribed spoken Hebrew that forms an integral part of a comprehensive data system that has been developed to suit the specific needs and interests of child language researchers: CHILDES (Child Language Data Exchange System). We introduce a dedicated transcription scheme for the spoken Hebrew data that is aware both of the phonology and of the standard orthography of the language. We also introduce a morphological analyzer that was specifically developed for this corpus. 1.
Approved by:
, 2004
"... 5 Conclusion 59 IV A finite-state based morphological analyzer for Hebrew Shlomo Yona Morphological analysis is an important component in many natural language processing tasks. Existing morphological analyzers for Hebrew are either limited or proprietary. We developed a morphological analyzer for u ..."
Abstract
- Add to MetaCart
5 Conclusion 59 IV A finite-state based morphological analyzer for Hebrew Shlomo Yona Morphological analysis is an important component in many natural language processing tasks. Existing morphological analyzers for Hebrew are either limited or proprietary. We developed a morphological analyzer for undotted Hebrew words that is based on finite-state
An Abstract Machine for Unification Grammars
, 1997
"... This work could never have been what it is without the support I received from my advisor, Nissim Francez. I am grateful to Nissim for introducing me to this subject, leading and advising me throughout the project, bearing with me when I messed things up and encouraging me all along the way. Many th ..."
Abstract
- Add to MetaCart
This work could never have been what it is without the support I received from my advisor, Nissim Francez. I am grateful to Nissim for introducing me to this subject, leading and advising me throughout the project, bearing with me when I messed things up and encouraging me all along the way. Many thanks are due to the members of my Thesis Committee: Bob Carpenter, Michael Elhadad, Alon Itai, Uzzi Ornan and Mori Rimon. I am especially indebted to Bob for his major help, mostly in the preliminary stages of this project. I also want to thank Uzzi for replacing Nissim when he was on sabbatical. While working on this thesis I spent a fruitful summer in the University of Tübingen, Seminar für Sprachwissenschaft, during which I learned a lot from many discussions. I wish to thank Paul King for his generosity and his wisdom. Thanks are also due to Erhard Hinrichs, Dale Gerdemann, Thilo Götz, Detmar Meurers, John Griffith and Frank Morawietz. I am grateful to Evgeniy Gabrilovich for many fruitful discussions and for going over my code. Special thanks to Holger Maier and Katrine Kirk for their hospitality and their company. Finally, I want to thank Yifat and Galia for being there when I needed it.

