Learning Stochastic Logic Programs
, 2000
"Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic contextfree grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a firstorder range"
Stochastic Logic Programs (SLPs) have been shown to be a generalisation of Hidden Markov Models (HMMs), stochastic contextfree grammars, and directed Bayes' nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a firstorder rangerestricted definite clause. This paper summarises the syntax, distributional semantics and proof techniques for SLPs and then discusses how a standard Inductive Logic Programming (ILP) system, Progol, has been modied to support learning of SLPs. The resulting system 1) nds an SLP with uniform probability labels on each definition and nearmaximal Bayes posterior probability and then 2) alters the probability labels to further increase the posterior probability. Stage 1) is implemented within CProgol4.5, which differs from previous versions of Progol by allowing userdefined evaluation functions written in Prolog. It is shown that maximising the Bayesian posterior function involves nding SLPs with short derivations of the examples. Search pruning with the Bayesian evaluation function is carried out in the same way as in previous versions of CProgol. The system is demonstrated with worked examples involving the learning of probability distributions over sequences as well as the learning of simple forms of uncertain knowledge.
Generation and Synchronous TreeAdjoining Grammars
, 1990
"Treeadjoining grammars (TAG) have been proposed as a formalism for generation based on the intuition that the extended domain of syntactic locality that TAGs provide should aid in localizing semantic dependencies as well, in turn serving as an aid to generation from semantic representations."
Treeadjoining grammars (TAG) have been proposed as a formalism for generation based on the intuition that the extended domain of syntactic locality that TAGs provide should aid in localizing semantic dependencies as well, in turn serving as an aid to generation from semantic representations. We demonstrate that this intuition can be made concrete by using the formalism of synchronous treeadjoining grammars. The use of synchronous TAGs for generation provides solutions to several problems with previous approaches to TAG generation. Furthermore, the semantic monotonicity requirement previously advocated for generation gram mars as a computational aid is seen to be an inherent property of synchronous TAGs.
Learnability in Optimality Theory
, 1995
"In this article we show how Optimality Theory yields a highly general Constraint Demotion principle for grammar learning. The resulting learning procedure specifically exploits the grammatical structure of Optimality Theory, independent of the content of substantive constraints defining any given gr"
In this article we show how Optimality Theory yields a highly general Constraint Demotion principle for grammar learning. The resulting learning procedure specifically exploits the grammatical structure of Optimality Theory, independent of the content of substantive constraints defining any given grammatical module. We decompose the learning problem and present formal results for a central subproblem, deducing the constraint ranking particular to a target language, given structural descriptions of positive examples. The structure imposed on the space of possible grammars by Optimality Theory allows efficient convergence to a correct grammar. We discuss implications for learning from overt data only, as well as other learning issues. We argue that Optimality Theory promotes confluence of the demands of more effective learnability and deeper linguistic explanation.
Insideoutside reestimation from partially bracketed corpora
 In Proceedings of the 30th Annual Meeting of the ACL
, 1992
"The insideoutside algorithm for inferring the parameters of a stochastic contextfree grammar is extended to take advantage of constituent information (constituent bracketing) in a partially parsed corpus. Experiments on formal and natural language parsed corpora show that the new algorithm can ach"
The insideoutside algorithm for inferring the parameters of a stochastic contextfree grammar is extended to take advantage of constituent information (constituent bracketing) in a partially parsed corpus. Experiments on formal and natural language parsed corpora show that the new algorithm can achieve faster convergence and better modeling of hierarchical structure than the original one. In particular, over 90 % test set bracketing accuracy was achieved for grammars inferred by our algorithm from a training set of handparsed partofspeech strings for sentences in the Air Travel Information System spoken language corpus. Finally, the new algorithm has better time complexity than the original one when sufficient bracketing is provided. 1
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with UnificationBased Grammars
 COMPUTATIONAL LINGUISTICS
, 1993
An Efficient Probabilistic ContextFree Parsing Algorithm that Computes Prefix Probabilities
 Computational Linguistics
, 2002
"... this article can compute solutions to all four of these problems in a single flamework, with a number of additional advantages over previously presented isolated solutions ..."
this article can compute solutions to all four of these problems in a single flamework, with a number of additional advantages over previously presented isolated solutions
Widecoverage efficient statistical parsing with CCG and loglinear models
 COMPUTATIONAL LINGUISTICS
, 2007
"This paper describes a number of loglinear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ"
This paper describes a number of loglinear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largestscale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
S.: Hidden Markov Model Induction by Bayesian Model Merging
 Advances in Neural Information Processing Systems 5
, 1993
"This paper describes a technique for learning both the number of states and the topology of Hidden Markov Models from examples. The induction process starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to mer"
This paper describes a technique for learning both the number of states and the topology of Hidden Markov Models from examples. The induction process starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to merge and the stopping criterion are guided by the Bayesian posterior probability. We compare our algorithm with the BaumWelch method of estimating fixedsize models, and find that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge. 1
Pfold: RNA secondary structure prediction using stochastic contextfree grammars
 Nucleic Acids Res
, 2003
"RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applic"
RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applications, but often it is possible to obtain related RNA sequences with conserved secondary structure. These should be included in structural analyses to give improved results. This work presents a practical way of predicting RNA secondary structure that is especially useful when related sequences can be obtained. The method improves a previous algorithm based on an explicit evolutionary model and a probabilistic model of structures. Predictions can be done on a web server at