Results 1 -
5 of
5
Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
, 1997
"... ..."
Bracketing and aligning words and constituents in parallel text using stochastic inversion transduction grammars
- in Parallel Text Processing: Alignment and Use of Translation Corpora
, 2000
"... parsing Abstract: We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major featur ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
parsing Abstract: We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finitestate transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing. 1.
Inducing Domain Theories
"... This thesis presents a method for learning a domain theory automatically from a corpus of parsed sentences. What is meant by a ‘domain theory ’ is a collection of facts and generalisations or rules which capture what commonly happens (or does not happen) in some domain of interest. As language users ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This thesis presents a method for learning a domain theory automatically from a corpus of parsed sentences. What is meant by a ‘domain theory ’ is a collection of facts and generalisations or rules which capture what commonly happens (or does not happen) in some domain of interest. As language users we implicitly draw on such theories in various disambiguation tasks, such as anaphora resolution and prepositional phrase attachment, and formal encodings of domain theories can be used for this purpose in natural language processing. Domain theories may also be objects of interest in their own right, that is, as the output of a knowledge discovery process, providing previously unobserved information to aid with the understanding of the domain. The learning paradigm employed is Inductive Logic Programming (ILP), which generalises over examples from the domain to obtain more general patterns covering the majority of the input instances. ILP was preferred over other machine learning techniques due to the expressive power of the language specifications guiding the search for general patterns andthefactthatitallowstheinclusion
AFormalism for Universal Segmentation of Text
"... Sumo is a formalism for universal segmentation of text. Its purpose is to provide a framework for the creation of segmentation applications. It is called universal as the formalism itself is independent of the language of the documents to process and independent of the levels of segmentation (e.g. w ..."
Abstract
- Add to MetaCart
Sumo is a formalism for universal segmentation of text. Its purpose is to provide a framework for the creation of segmentation applications. It is called universal as the formalism itself is independent of the language of the documents to process and independent of the levels of segmentation (e.g. words, sentences, paragraphs, morphemes...) considered by the target application. This framework relies on a layered structure representing the possible segmentations of the document. This structure and the tools to manipulate it are described, followed by detailed examples highlighting some features of Sumo.

