Results 1  10
of
11
Machine Translation with Inferred Stochastic FiniteState Transducers
 COMPUTATIONAL LINGUISTICS
, 2004
"... Finitestate transducers are models that are being used in different areas of pattern recognition and computational linguistics. One of these areas is machine translation, in which the approaches that are based on building models automatically from training examples are becoming more and more attrac ..."
Abstract

Cited by 58 (14 self)
 Add to MetaCart
Finitestate transducers are models that are being used in different areas of pattern recognition and computational linguistics. One of these areas is machine translation, in which the approaches that are based on building models automatically from training examples are becoming more and more attractive. Finitestate transducers are veryadequate for use in constrained tasks in which training samples of pairs of sentences are available. A technique for inferring finitestate transducers is proposed in this article. This technique is based on formalrelations between finitestate transducers and rational grammars. Given a training corpus of sourcetarget pairs of sentences, the proposed approach uses statistical alignment methods to produce a set of conventional strings from which a stochastic rational grammar (e.g., an ngram) is inferred. This grammar is finally converted into a finitestate transducer. The proposed methods are assessed through a series of machine translation experiments within the framework of the EuTrans project.
Unsupervised Language Acquisition: Theory and Practice
, 2001
"... In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the socalled Argument from the Poverty of the Stimulus advanced in favour of the p ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the socalled Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have languagespecific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with nonconcatenative morphology; thirdly an algorithm for the unsupervised induction of a contextfree grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.
The EuTRANSI Speech Translation System
, 1999
"... The EuTRANS project aims at using ExampleBased approaches for the automatic development of Machine Translation systems accepting text and speech input for limited domain applications. During the first phase of the project, a speech translation system that is based on the use of automatically le ..."
Abstract

Cited by 22 (13 self)
 Add to MetaCart
The EuTRANS project aims at using ExampleBased approaches for the automatic development of Machine Translation systems accepting text and speech input for limited domain applications. During the first phase of the project, a speech translation system that is based on the use of automatically learnt Subsequential Transducers has been built. This paper contains a detailed and to a long extent selfcontained overview of the transducer learning algorithms and system architecture, along with a new approach for using categories representing words or short phrases in both input and output languages. Experimental results using this approach are reported for a task involving the recognition and translation of sentences in the hotel reception communication domain, with a vocabulary of 683 words in Spanish. A translation word error rate of 1.97% is achieved in real time factor 2.7 in a Personal Computer.
Computational Complexity of Problems on Probabilistic Grammars and Transducers.
 In Proc. ICGI
, 2000
"... Determinism plays an important role in grammatical inference. ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Determinism plays an important role in grammatical inference.
Probabilistic FiniteState Machines  Part I
"... Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finitestate automata with other well known devices that generate strings as hidden Markov models and ngrams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
A StatisticalEstimation Method for Stochastic FiniteState Transducers based on Entropy Measures
 Machine Learning
, 2000
"... The stochastic extension of formal translations constitutes a suitable framework for dealing with many problems in Syntactic Pattern Recognition. Some estimation criteria have already been proposed and developed for the parameter estimation of Regular SyntaxDirected Translation Schemata. Here, ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
The stochastic extension of formal translations constitutes a suitable framework for dealing with many problems in Syntactic Pattern Recognition. Some estimation criteria have already been proposed and developed for the parameter estimation of Regular SyntaxDirected Translation Schemata. Here, a new criterium is proposed for dealing with situations when training data is sparse. This criterium is based on entropy measurements, somehow inspired in the Maximum Mutual Information criterium, and it takes into account the possibility of ambiguity in translations (i.e., the translation model may yield dierent output strings for a single input string.) The goal in the stochastic framework is to nd the most probable translation of a given input string. Experiments were performed on a translation task which has a high degree of ambiguity.
FiniteState Transducers For SpeechInput Translation
 IEEE Automatic Speech Recognition and Understanding Workhsop, ASRU’01
, 2001
"... Nowadays, hidden Markov models (HMMs) and ngrams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a ngram or a stochastic finitestate grammar (the language model). Similar models can be used for speech tra ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Nowadays, hidden Markov models (HMMs) and ngrams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a ngram or a stochastic finitestate grammar (the language model). Similar models can be used for speech translation, and HMMs (the acoustic models) can be integrated into a finitestate transducer (the translation model). Moreover, the translation process can be performed by searching for an optimal path of states in the integrated network. The output of this search process is a target word sequence associated to the optimal path. In speech translation, HMMs can be trained from a source speech corpus, and the translation model can be learned automatically from a parallel training corpus.
Inference of FiniteState Transducers By Using Regular Grammars and Morphisms
, 2000
"... A technique to infer finitestate transducers is proposed in this work. This technique is based on the formal relations between finitestate transducers and regular grammars. The technique consists of: 1) building a corpus of training strings from the corpus of training pairs; 2) inferring a regular ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
A technique to infer finitestate transducers is proposed in this work. This technique is based on the formal relations between finitestate transducers and regular grammars. The technique consists of: 1) building a corpus of training strings from the corpus of training pairs; 2) inferring a regular grammar and 3) transforming the grammar into a finitestate transducer.
Adapting FiniteState Translation to the TransType2 project
"... Machine translation can play an important role nowadays, helping communication between people. One of the projects in this field is TransType2 1. Its purpose is to develop an innovative, interactive machine translation system. TransType2 aims at facilitating the task of producing highquality transl ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Machine translation can play an important role nowadays, helping communication between people. One of the projects in this field is TransType2 1. Its purpose is to develop an innovative, interactive machine translation system. TransType2 aims at facilitating the task of producing highquality translations, and make the translation task more costeffective for human translators. To achieve this goal, stochastic finitestate transducers are being used. Stochastic finitestate transducers are generated by means of hybrid finitestate and statistical alignment techniques. Viterbi parsing procedure with stochastic finitestate transducers have been adapted to take into account the source sentence to be translated and the target prefix given by the human translator. Experiments have been carried out with a corpus of printer manuals. The first results showed that with this preliminary prototype, users can only type a 15 % of the words instead the whole complete translated text. 1
TransducerLearning Experiments on Language Understanding
, 1998
"... The interest in using FiniteState Models in a large variety of applications is recently growing as more powerful techniques for learning them from examples have been developed. Language Understanding can be approached this way as a problem of language translation in which the target language is ..."
Abstract
 Add to MetaCart
The interest in using FiniteState Models in a large variety of applications is recently growing as more powerful techniques for learning them from examples have been developed. Language Understanding can be approached this way as a problem of language translation in which the target language is a formal language rather than a natural one. Finitestate transducers are used to model the translation process, and are automatically learned from training data consisting of pairs of naturallanguage/formallanguage sentences. The need for training data is dramatically reduced by performing a twolevel learning process based on lexical/phrase categorization. Successful experiments are presented on a task consisting in the "understanding" of Spanish naturallanguage sentences describing dates and times, where the target formal language is the one used in the popular Unix command "at".