## A general technique to train language models on language models (2005)

### Cached

### Download Links

Venue: | Computational Linguistics |

Citations: | 6 - 1 self |

### BibTeX

@ARTICLE{Nederhof05ageneral,

author = {Mark-jan Nederhof},

title = {A general technique to train language models on language models},

journal = {Computational Linguistics},

year = {2005},

volume = {31}

}

### Years of Citing Articles

### OpenURL

### Abstract

models

### Citations

303 | Finite-State Transducers in Language and Speech Processing
- Mohri
- 1997
(Show Context)
Citation Context ...am model accepts any input string over the alphabet, which does not hold for general (unambiguous) FAs. Another application of our work involves determinization and minimization of PFAs. As shown by (=-=Mohri, 1997-=-), PFAs cannot always be determinized, and no practical algorithms are known to minimize arbitrary nondeterministic (P)FAs. This can be a problem when deterministic or small PFAs are required. We can ... |

160 |
Introduction to probabilistic automata
- Paz
- 1971
(Show Context)
Citation Context .... Preliminaries Many of the definitions on probabilistic context-free grammars are based on (Santos, 1972; Booth and Thompson, 1973) and the definitions on probabilistic finite automata are based on (=-=Paz, 1971-=-; Starke, 1972). A context-free grammar G is a 4-tuple (Σ, N, S, R), where Σ and N are two finite disjoint sets of terminals and nonterminals, respectively, S ∈ N is the start symbol, and R is a finit... |

117 |
On formal properties of simple phrase structure grammars
- Bar-Hillel, Perles, et al.
- 1964
(Show Context)
Citation Context ...olutions to arbitrary precision by means of fixed-point iteration. 5sComputational Linguistics Volume 00, Number 0 4. Intersection of context-free and regular languages We recall a construction from (=-=Bar-Hillel, Perles, and Shamir, 1964-=-) that computes the intersection of a context-free language and a regular language. The input consists of a CFG G = (Σ, N, S, R) and a FA M = (Σ, Q, q0, qf , T ); note that we assume, without loss of ... |

90 |
On multiple context-free grammars
- Seki, Matsumura, et al.
- 1991
(Show Context)
Citation Context ...Shanker and Weir, 1993) and range concatenation grammars (Boullier, 2000; Bertsch and Nederhof, 2001). The construction for the latter also has implications for linear context-free rewriting systems (=-=Seki et al., 1991-=-). The construction has been extended by (Nederhof and Satta, 2003) to apply to a PCFG G = (Σ, N, S, R, pG) and a PFA M = (Σ, Q, q0, qf , T, pM). The output is a PCFG G∩ = (Σ, N∩, S∩, R∩, p∩), where N... |

39 | Recognition can be harder than parsing
- Lang
- 1994
(Show Context)
Citation Context ...q0, w) c ⊢ (qf , ɛ). Conversely, if for some w, d and c we have S d ⇒ w and (q0, w) c ⊢ (qf , ɛ), then there is precisely one d∩ derivation d∩ such that h(d∩) = (d, c) and S∩ ⇒ w. It was observed by (=-=Lang, 1994-=-) that G∩ can be seen as a parse forest, i.e., a compact representation of all parse trees according to G that derive strings recognized by M. The construction can be generalized to e.g. tree-adjoinin... |

39 | Precise N-gram probabilities from stochastic context-free grammars
- STOLCKE, SEGAL
- 1994
(Show Context)
Citation Context ...ent is that the FA to be trained is unambiguous, by which we mean that each input string can be recognized by at most one computation of the FA. The special case of n-grams was already formulated by (=-=Stolcke and Segal, 1994-=-), realizing an idea envisioned before by (Rimon and Herz, 1991). An n-gram model is here seen as a (P)FA that contains exactly one state for each possible history of the n−1 previously read symbols. ... |

36 | Regular approximation of context-free grammars through transformation
- Mohri, Nederhof
- 2001
(Show Context)
Citation Context ...t that in practice one is more interested in the probabilities of sentences than in a purely Boolean distinction between grammatical and ungrammatical sentences. Several approaches were discussed by (=-=Mohri and Nederhof, 2001-=-) to extend this work to approximation of PCFGs by means of PFAs. A first approach is to directly map rules with attached probabilities to transitions with attached probabilities. Although this is com... |

33 | The Berkeley Restaurant Project
- JURAFSKY, WOOTERS, et al.
- 1994
(Show Context)
Citation Context ... the PCFG by means of a (pseudo-)random generator of sentences, such that sentences that are more likely according to the PCFG are generated with greater likelihood. This has been proposed before by (=-=Jurafsky et al., 1994-=-), for the special case of bigrams, extending a nonprobabilistic technique by (Zue and others, 1991). It is not clear however whether this idea is feasible for training of finite-state models that are... |

29 | Practical Experiments with Regular Approximation of Context-Free Languages - Nederhof - 2000 |

24 | Range Concatenation Grammars - Boullier - 2000 |

19 |
Probabilistic parsing as intersection
- Nederhof, Satta
(Show Context)
Citation Context ...oullier, 2000; Bertsch and Nederhof, 2001). The construction for the latter also has implications for linear context-free rewriting systems (Seki et al., 1991). The construction has been extended by (=-=Nederhof and Satta, 2003-=-) to apply to a PCFG G = (Σ, N, S, R, pG) and a PFA M = (Σ, Q, q0, qf , T, pM). The output is a PCFG G∩ = (Σ, N∩, S∩, R∩, p∩), where N∩, S∩ and R∩ are as before, and p∩ is defined by: p∩((r0, A, rm) →... |

19 |
Abstract Automata
- Starke
- 1972
(Show Context)
Citation Context ...ries Many of the definitions on probabilistic context-free grammars are based on (Santos, 1972; Booth and Thompson, 1973) and the definitions on probabilistic finite automata are based on (Paz, 1971; =-=Starke, 1972-=-). A context-free grammar G is a 4-tuple (Σ, N, S, R), where Σ and N are two finite disjoint sets of terminals and nonterminals, respectively, S ∈ N is the start symbol, and R is a finite set of rules... |

12 |
Parsing, volume 1 of The Theory of Parsing, Translation and Compiling
- Aho, Ullman
- 1972
(Show Context)
Citation Context ...rovided � w pG(w) > 0. This reduction consists in removing from the grammar any nonterminal A for which the above conditions do not hold, together with any rule that contains such a nonterminal; see (=-=Aho and Ullman, 1972-=-) for reduction of CFGs, which is very similar. A finite automaton M is a 5-tuple (Σ, Q, q0, qf , T ), where Σ and Q are two finite sets of terminals and states, respectively, q0, qf ∈ Q are the initi... |

10 |
Applying probabilistic measures to abstract languages
- Booth, Thompson
- 1973
(Show Context)
Citation Context ...a PFA (Section 5.2), and training of an unambiguous FA on the basis of a PFA (Section 5.3). 2. Preliminaries Many of the definitions on probabilistic context-free grammars are based on (Santos, 1972; =-=Booth and Thompson, 1973-=-) and the definitions on probabilistic finite automata are based on (Paz, 1971; Starke, 1972). A context-free grammar G is a 4-tuple (Σ, N, S, R), where Σ and N are two finite disjoint sets of termina... |

8 |
The recognition capacity of local syntactic constraints
- Rimon, Herz
- 1991
(Show Context)
Citation Context ...that each input string can be recognized by at most one computation of the FA. The special case of n-grams was already formulated by (Stolcke and Segal, 1994), realizing an idea envisioned before by (=-=Rimon and Herz, 1991-=-). An n-gram model is here seen as a (P)FA that contains exactly one state for each possible history of the n−1 previously read symbols. It is clear that such a FA is unambiguous (even deterministic),... |

5 |
Probabilistic grammars and automata
- Santos
- 1972
(Show Context)
Citation Context ... the basis of a PFA (Section 5.2), and training of an unambiguous FA on the basis of a PFA (Section 5.3). 2. Preliminaries Many of the definitions on probabilistic context-free grammars are based on (=-=Santos, 1972-=-; Booth and Thompson, 1973) and the definitions on probabilistic finite automata are based on (Paz, 1971; Starke, 1972). A context-free grammar G is a 4-tuple (Σ, N, S, R), where Σ and N are two finit... |

1 |
Training models on models Nederhof
- Vijay-Shanker, Weir
- 1993
(Show Context)
Citation Context ...n be seen as a parse forest, i.e., a compact representation of all parse trees according to G that derive strings recognized by M. The construction can be generalized to e.g. tree-adjoining grammars (=-=Vijay-Shanker and Weir, 1993-=-) and range concatenation grammars (Boullier, 2000; Bertsch and Nederhof, 2001). The construction for the latter also has implications for linear context-free rewriting systems (Seki et al., 1991). Th... |

1 | The Berkeley Restaurant Project - Lang - 1994 |

1 | Nederhof Training Models on Models - Starke - 1972 |