Results 11 -
14 of
14
Multilingual Lexical Database Generation from parallel texts with endogenous resources
- PAPILLON-2005 Workshop on Multilingual Lexical Databases. Chiang Rai
, 2005
"... This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources s ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘Acquis Communautaire ’ Corpus. 1
Bracketing and aligning words and constituents in parallel text using stochastic inversion transduction grammars
- in Parallel Text Processing: Alignment and Use of Translation Corpora
, 2000
"... parsing Abstract: We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major featur ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
parsing Abstract: We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finitestate transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing. 1.
Segmenting a Sentence into Morphemes Using Statistic Information between Words
- Words, Proceedings of the 15th International Conference on Computational Linguistics(Coling’94
, 1994
"... This paper is on dividing non-separated language sentences (whose words are not separated from each other with a space or other separaters) into morphemes using statistical information, not grammatical information which is often used in NLP. In this paper we describe our method and experimental resu ..."
Abstract
- Add to MetaCart
This paper is on dividing non-separated language sentences (whose words are not separated from each other with a space or other separaters) into morphemes using statistical information, not grammatical information which is often used in NLP. In this paper we describe our method and experimental result on Japanese and Chinese sentences. As will be seen in the body of this paper, the result shows that this system is efficient for most of the sentences.
Towards the Automatic Acquisition of Lexical Selection Rules
- MT SUMMIT VII SEPT.1999
, 1999
"... This paper is a study of a certain type of collocations and implication and applica-tion to acquisition of lexical selection rules in transfer-approach MT systems. Collo-cations reveal the co-occurrence possibil-ities of linguistic units in one language, which often require lexical selection rules t ..."
Abstract
- Add to MetaCart
This paper is a study of a certain type of collocations and implication and applica-tion to acquisition of lexical selection rules in transfer-approach MT systems. Collo-cations reveal the co-occurrence possibil-ities of linguistic units in one language, which often require lexical selection rules to enhance the natural flow and clarity of MT output. The study presents an auto-matic acquisition and human verification process to acquire collocations and sug-gest possible candidates for lexical selec-tion rules. The mechanism has been used in the development and enhancement of the Chinese-English and Japanese-English MT systems, and can be easily adapted to other language pairs. Future work in-cludes expanding its usage to more lan-guage pairs and furthering its application to MT customers.

