## Stochastic K-TSS bi-languages for Machine Translation

### Cached

### Download Links

### BibTeX

@MISC{Torres_stochastick-tss,

author = {M. Inés Torres and Francisco Casacuberta},

title = {Stochastic K-TSS bi-languages for Machine Translation},

year = {}

}

### OpenURL

### Abstract

One of the approaches to statistical machine translation is based on joint probability distributions over some source and target languages. In this work we propose to model the joint probability distribution by stochastic regular bi-languages. Specifically we introduce the stochastic k-testable in the strict sense bi-languages to represent the joint probability distribution of source and target languages. With this basis we present a reformulation of the GIATI methodology to infer stochastic regular bi-languages for machine translation purposes. 1

### Citations

6080 |
A mathematical theory of communication
- Shannon
- 1948
(Show Context)
Citation Context ...e, i.e. a string of symbols s = s1 . . . s |s|, si ∈ Σ into a sentence in the target language t = t1 . . . t |t|, ti ∈ ∆. Statistical machine translation (SMT) is based on the noisy channel approach (=-=Shannon, 1948-=-) where t is considered to be a noisy version of s (Brown et al., 1993). Thus, the translation of a given string s ∈ ∆ ∗ in the source language is a string ˆt ∈ ∆ ∗ in the target language such that: ˆ... |

1255 | 2003. A Systematic Comparison of Various Statistical Alignment Models
- Och, Ney
(Show Context)
Citation Context ...e of a suitable alignment is a difficult problem to be solved. One way to deal with this problem in the machine translation framework is the use of statistical alignments models (Brown et al., 1993) (=-=Och and Ney, 2003-=-). The choice of an adequate alignment/segmentation procedure is also related with the parsing procedure based on the bi-automaton. In the translation procedure, the target sentence ˆt is obtained as ... |

1177 | The mathematics of statistical machine translation: Parameter estimation
- Brown, Pietra, et al.
(Show Context)
Citation Context ...guage. The translation models in SMT are automatically learned from bilingual samples. In the early nineties machine translation was tackled as a pure probabilistic process by the IBM research group (=-=Brown et al., 1993-=-). Within the SMT framework, stochastic-finite-state transducers (SFSTs) have also been proposed for machine translation purposes (Bangalore and Riccardi, 2002) (Shankar et al., 2005) (Casacuberta and... |

99 | Learning subsequential transducers for pattern recognition interpretation tasks
- Oncina, García, et al.
- 1993
(Show Context)
Citation Context ... al., 2006) proposed n-grams models of bi-lingual units. However, only a few techniques to learn finite-state transducers for machine translation purposes can be found (Bangalore and Riccardi, 2002) (=-=Oncina et al., 1993-=-) (Knight and Al-Onaizan, 1998) (Casacuberta and Vidal, 2007). On the other hand, a method of inference of SFST based on the inference of stochastic finite-state automata (Casacuberta and Vidal, 2004)... |

65 |
Inference of k-testable languages in the strict sense and application to syntactic pattern recognition
- García, Vidal
- 1990
(Show Context)
Citation Context ...es that can be inferred from a set of positive training data (Torres and Varona, 2001) (Vidal et al., 2005a) (Torres and Casacuberta, 2011) by some stochastic extension of the inference algorithm in (=-=García and Vidal, 1990-=-). Thus, they belong to the subset of regular languages that can be used to characterize some pattern recog98 Proceedings of the 9th International Workshop on Finite State Methods and Natural Language... |

58 | Machine translation with inferred stochastic finitestate transducers
- Casacuberta, Vidal
- 2004
(Show Context)
Citation Context ...wn et al., 1993). Within the SMT framework, stochastic-finite-state transducers (SFSTs) have also been proposed for machine translation purposes (Bangalore and Riccardi, 2002) (Shankar et al., 2005) (=-=Casacuberta and Vidal, 2004-=-) (Casacuberta and Vidal, 2007) (Blackwood et al., 2009). In such a context, SMT can be viewed as the problem of computing the joint probability distribution of some source and target languages. i.e. ... |

44 |
Translation with finite-state devices
- Knight, Al-Onaizan
- 1998
(Show Context)
Citation Context ...-grams models of bi-lingual units. However, only a few techniques to learn finite-state transducers for machine translation purposes can be found (Bangalore and Riccardi, 2002) (Oncina et al., 1993) (=-=Knight and Al-Onaizan, 1998-=-) (Casacuberta and Vidal, 2007). On the other hand, a method of inference of SFST based on the inference of stochastic finite-state automata (Casacuberta and Vidal, 2004) was proposed and then used in... |

33 |
A Weighted Finite State Transducer Translation Template Model for Statistical Machine Translation
- Kumar, Deng, et al.
- 2005
(Show Context)
Citation Context ...IBM research group (Brown et al., 1993). Within the SMT framework, stochastic-finite-state transducers (SFSTs) have also been proposed for machine translation purposes (Bangalore and Riccardi, 2002) (=-=Shankar et al., 2005-=-) (Casacuberta and Vidal, 2004) (Casacuberta and Vidal, 2007) (Blackwood et al., 2009). In such a context, SMT can be viewed as the problem of computing the joint probability distribution of some sour... |

27 | Stochastic Finite-State Models for Spoken Language
- Bangalore, Riccardi
- 2000
(Show Context)
Citation Context ...e probabilistic process by the IBM research group (Brown et al., 1993). Within the SMT framework, stochastic-finite-state transducers (SFSTs) have also been proposed for machine translation purposes (=-=Bangalore and Riccardi, 2002-=-) (Shankar et al., 2005) (Casacuberta and Vidal, 2004) (Casacuberta and Vidal, 2007) (Blackwood et al., 2009). In such a context, SMT can be viewed as the problem of computing the joint probability di... |

23 |
Définition et étude des bilangages réguliers
- Pair, Quéré
- 1968
(Show Context)
Citation Context ...lows for pairs of symbols by contrast with definition 2.1 where pairs of finite-length strings are considered. Finally, let us note that regular tree languages were also been referred as bilanguages (=-=Pair and Quere, 1968-=-) (Berger and Pair, 1978). We are now referring to the work by (Vidal et al., 2005a). This work is a survey of probabilistic finite-state machines and related definitions and properties. In this surve... |

19 | Computational complexity of problems on probabilistic grammars and transducers - Casacuberta, Higuera, et al. - 2000 |

15 | Probabilistic finite-state machines, part I - Vidal, Tollard, et al. - 2005 |

9 |
k-TSS language models in speech recognition
- Torres, Varona
(Show Context)
Citation Context ... for bistrings not in the stochastic bi-language generated by the inferred bi-automaton. Specific smoothing schemas has been proposed for stochastic k-TSS automata for speech recognition purposes in (=-=Torres and Varona, 2001-=-) and in (Llorens et al., 2002). Under a back-off scheme, these techniques adjust the maximum likelihood estimation of transition probabilities to recursively obtain probabilities to be assigned to un... |

7 | Learning finite-state models for machine translation - Casacuberta, Vidal - 2007 |

6 | Large-scale statistical machine translation with weighted finite state transducers - Blackwood, Gispert, et al. - 2009 |

5 |
Local languages, the succesor method, and a step towards a general methodology for the inference of regular grammars
- Garcia, Vidal, et al.
- 1987
(Show Context)
Citation Context ... the class of 2-TSS languages, which are known as local languages. There is an important generative property which relates local languages and general regular languages given by the morphism theorem (=-=García et al., 1987-=-), which establish that any regular language can be generated by a local language. A stochastic extension of the morphism theorem was introduced in (Vidal et al., 2005b). A stochastic regular bi-langu... |

5 |
Una Aproximación Inductiva a la Comprensión del Discurso Continuo”. PhDthesis, Universidad Politécnica de Valencia
- Segarra
- 1993
(Show Context)
Citation Context ...15, 2011. c○2011 Association for Computational Linguisticsnition tasks. In particular, stochastic k-TSS has been used in many natural language processing tasks such as phone recognition (Galiano and =-=Segarra, 1993-=-), speech recognition (Torres and Varona, 2001), language identification (Guijarrubia and Torres, 2010), language modeling (Justo and Torres, 2009) or machine translation (Pérez et al., 2008). In this... |

4 |
Finite state language models smoothed using n-grams
- Llorens, Vilar, et al.
- 2002
(Show Context)
Citation Context ...tic bi-language generated by the inferred bi-automaton. Specific smoothing schemas has been proposed for stochastic k-TSS automata for speech recognition purposes in (Torres and Varona, 2001) and in (=-=Llorens et al., 2002-=-). Under a back-off scheme, these techniques adjust the maximum likelihood estimation of transition probabilities to recursively obtain probabilities to be assigned to unseen combinations of strings f... |

3 |
Phrase classes in twolevel language models for asr. Pattern Analysis and Applications
- Justo, Torres
- 2009
(Show Context)
Citation Context ...nguage processing tasks such as phone recognition (Galiano and Segarra, 1993), speech recognition (Torres and Varona, 2001), language identification (Guijarrubia and Torres, 2010), language modeling (=-=Justo and Torres, 2009-=-) or machine translation (Pérez et al., 2008). In this work we propose to model the joint probability distribution P (t, s) by stochastic regular bilanguages. A first contribution of our work is the r... |

2 |
Transductions and context-free languages (B.G
- Berstel
- 1979
(Show Context)
Citation Context ...nts for transducer inference (GIATI) and is based on some important properties relating regular translations generated by finite-state-transducers and regular languages over some bi-lingual alphabet (=-=Berstel, 1979-=-). On the other hand, different stochastic regular bilanguages can be introduced to model P (s, t) distribution. Turning to stochastic regular languages, let us note that the class of stochastic k-tes... |

2 |
The application of k-testable languages in the strict sense to phone recognition in automatic speech recognition
- Galiano, Segarra
- 1993
(Show Context)
Citation Context ...e), July 12-15, 2011. c○2011 Association for Computational Linguisticsnition tasks. In particular, stochastic k-TSS has been used in many natural language processing tasks such as phone recognition (=-=Galiano and Segarra, 1993-=-), speech recognition (Torres and Varona, 2001), language identification (Guijarrubia and Torres, 2010), language modeling (Justo and Torres, 2009) or machine translation (Pérez et al., 2008). In this... |

2 |
Formal Phonology , Outstanding Dissertations in Linguistics
- Kornai
- 1995
(Show Context)
Citation Context ...ttheoretic operations of intersection, union and complementation. Concatenation of such bi-strings is also defined in (Kornai, 2008). In this context, regular bi-languages were previously defined in (=-=Kornai, 1995-=-). In the context of machine translation, (Mariño et al., 2006) defines a bi-language as composed of bi-lingual units which were referred to as tuples extracted from alignments of a bilingual corpus. ... |

2 |
Joining linguistic and statistical methods for Spanish-toBasque speech translation
- Pérez, Torres, et al.
- 2008
(Show Context)
Citation Context ...nference of SFST based on the inference of stochastic finite-state automata (Casacuberta and Vidal, 2004) was proposed and then used in machine translation applications (Casacuberta and Vidal, 2007) (=-=Pérez et al., 2008-=-) (González and Casacuberta, 2009). This method was called grammatical inference and alignments for transducer inference (GIATI) and is based on some important properties relating regular translations... |

1 |
Inference for regular bilanguages
- Berger, Pair
- 1978
(Show Context)
Citation Context ...ls by contrast with definition 2.1 where pairs of finite-length strings are considered. Finally, let us note that regular tree languages were also been referred as bilanguages (Pair and Quere, 1968) (=-=Berger and Pair, 1978-=-). We are now referring to the work by (Vidal et al., 2005a). This work is a survey of probabilistic finite-state machines and related definitions and properties. In this survey, the authors provide a... |

1 |
Text and speech based phonotactic models for spoken language identification of basque and spanish
- Guijarrubia, Torres
- 2010
(Show Context)
Citation Context ... stochastic k-TSS has been used in many natural language processing tasks such as phone recognition (Galiano and Segarra, 1993), speech recognition (Torres and Varona, 2001), language identification (=-=Guijarrubia and Torres, 2010-=-), language modeling (Justo and Torres, 2009) or machine translation (Pérez et al., 2008). In this work we propose to model the joint probability distribution P (t, s) by stochastic regular bilanguage... |

1 |
Stochastic k-tss languages
- Torres, Casacuberta
- 2011
(Show Context)
Citation Context ...testable in the strict sense (k-TSS) languages is a subclass of stochastic regular languages that can be inferred from a set of positive training data (Torres and Varona, 2001) (Vidal et al., 2005a) (=-=Torres and Casacuberta, 2011-=-) by some stochastic extension of the inference algorithm in (García and Vidal, 1990). Thus, they belong to the subset of regular languages that can be used to characterize some pattern recog98 Procee... |