## Logical hidden markov models (2006)

### Cached

### Download Links

- [www.cs.kuleuven.be]
- [www.informatik.uni-freiburg.de]
- [people.csail.mit.edu]
- [www.jair.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Artificial Intelligence Research |

Citations: | 46 - 13 self |

### BibTeX

@ARTICLE{Kersting06logicalhidden,

author = {Kristian Kersting and Luc De Raedt and Tapani Raiko},

title = {Logical hidden markov models},

journal = {Journal of Artificial Intelligence Research},

year = {2006},

volume = {25},

pages = {2006}

}

### OpenURL

### Abstract

Logical hidden Markov models (LOHMMs) upgrade traditional hidden Markov models to deal with sequences of structured symbols in the form of logical atoms, rather than flat characters. This note formally introduces LOHMMs and presents solutions to the three central inference problems for LOHMMs: evaluation, most likely hidden state sequence and parameter estimation. The resulting representation and algorithms are experimentally evaluated on problems from the domain of bioinformatics. 1.

### Citations

4096 |
Introduction to Automata Theory, Languages and Computation
- Hopcroft, Ullman
- 1979
(Show Context)
Citation Context ...h PL(q0 = S), i.e., PL(q0 = S) : S ← start , constitutes the prior distribution of L ′ . The argumentation basically followed the approach to transform a Mealy machine into a Moore machine (see e.g., =-=Hopcroft and Ullman, 1979-=-). Furthermore, the mapping of a Moore-LOHMM – as introduced in the present section – into a Mealy-LOHMM is straightforward. Appendix C. Proof of Theorem 2 Let T be a terminal alphabet and N a nonterm... |

1332 |
Head-driven Phrase Structure Grammar
- Pollard, Sag
- 1994
(Show Context)
Citation Context ...ensively studied in computational linguistics. Examples are (stochastic) attribute-value grammars (Abney, 1997), probabilistic feature grammars (Goodman, 1997), head-driven phrase structure grammars (=-=Pollard & Sag, 1994-=-), and lexical-functional grammars (Bresnan, 2001). For learning within such frameworks, methods from undirected graphical models are used; see the work of Johnson (2003) for a description of some rec... |

1103 | SCOP: a structural classification of proteins database for the investigation of sequences and structures
- Murzin, Brenner, et al.
- 1995
(Show Context)
Citation Context ...ion schemes of proteins have been developed that group the current set of known protein structures according to the similarity of their folds. For instance, the structural classification of proteins (=-=Hubbard, Murzin, Brenner, & Chotia, 1997-=-) (SCOP) database hierarchically organizes proteins according to their structures and evolutionary origin. From a machine learning perspective, SCOP induces a classification problem: given a protein o... |

1089 | Inductive Logic Programming
- Muggleton, editor
- 1992
(Show Context)
Citation Context .... . . levels down the tree. In the second type of approaches, most attention has been devoted to developing highly expressive formalisms, such as e.g. PCUP (Eisele, 1994), PCLP (Riezler, 1998), SLPs (=-=Muggleton, 1996-=-), PLPs (Ngo & Haddawy, 1997), RBNs (Jaeger, 1997), PRMs (Friedman, Getoor, Koller, & Pfeffer, 1999), PRISM (Sato & Kameya, 2001), BLPs (Kersting & De Raedt, 2001b, 2001a), and DPRMs (Sanghai, Domingo... |

1059 | The EM Algorithm and Extensions - McLachlan, Krishnan - 1997 |

909 | Biological sequence analysis: Probabilistic Models of Proteins and Nucleic Acids
- Durbin
- 1998
(Show Context)
Citation Context ...gh sequence similarity. 6.3 mRNA Signal Structure Detection mRNA sequences consist of bases (guanine, adenine, uracil, cytosine) and fold intramolecularly to form a number of short base-paired stems (=-=Durbin, Eddy, Krogh, & Mitchison, 1998-=-). This base-paired structure is called the secondary structure, cf. Figures 5 and 6. The secondary structure contains special subsequences called signal structures that are responsible for special bi... |

900 | An introduction to hidden Markov models
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ...sequence and parameter estimation. The resulting representation and algorithms are experimentally evaluated on problems from the domain of bioinformatics. 1. Introduction Hidden Markov models (HMMs) (=-=Rabiner & Juang, 1986-=-) are extremely popular for analyzing sequential data. Application areas include computational biology, user modelling, speech recognition, empirical natural language processing, and robotics. Despite... |

548 |
An inequality and associated maximization technique in statistical estimation for probabilistic functions of a markov process
- Baum
- 1972
(Show Context)
Citation Context ...ems. For parameter estimation, we have to estimate the maximum likelihood transition probabilities and selection distributions. To estimate the former, we upgrade the well-known Baum-Welch algorithm (=-=Baum, 1972-=-) for estimating the maximum likelihood parameters of HMMs and probabilistic context-free grammars. For HMMs, the Baum-Welch algorithm computes the improved estimate p of the transition probability of... |

532 | Learning probabilistic relational models, in
- Getoor, Friedman, et al.
(Show Context)
Citation Context ...ion has been devoted to developing highly expressive formalisms, such as e.g. PCUP (Eisele, 1994), PCLP (Riezler, 1998), SLPs (Muggleton, 1996), PLPs (Ngo & Haddawy, 1997), RBNs (Jaeger, 1997), PRMs (=-=Friedman, Getoor, Koller, & Pfeffer, 1999-=-), PRISM (Sato & Kameya, 2001), BLPs (Kersting & De Raedt, 2001b, 2001a), and DPRMs (Sanghai, Domingos, & Weld, 2003). LOHMMs can be seen as an attempt towards downgrading such highly expressive frame... |

510 | Factorial hidden Markov models
- Ghahramani, Jordan
- 1997
(Show Context)
Citation Context ...eglected and variables are treated independently. Adapting more expressive approaches is an interesting future line of research. For instance, Bayesian networks allow one to represent factorial HMMs (=-=Ghahramani & Jordan, 1997-=-). Factorial HMMs can be viewed as LOHMMs, where the hidden states are summarized by a 2 · k-ary abstract state. The first k arguments encode the k state variables, and the last k arguments serve as a... |

478 | Inductive logic programming: Theory and methods - Muggleton, Raedt - 1994 |

381 |
The estimation of stochastic context-free grammars using the Inside-Outside algorithm’, Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...tences. For grammars in GNF, pushdown automata are common for parsing. In contrast, the actual computations of the Baum-Welch algorithm for PCFGs, the so called Inside-Outside algorithm (Baker, 1979; =-=Lari & Young, 1990-=-), is usually formulated for grammars in Chomsky normal form 7 . The Inside-Outside algorithm can make use of the efficient CYK algorithm (Hopcroft & Ullman, 1979) for parsing strings. An alternative ... |

331 |
2001, Lexical-Functional Syntax
- Bresnan
(Show Context)
Citation Context ... are (stochastic) attribute-value grammars (Abney, 1997), probabilistic feature grammars (Goodman, 1997), head-driven phrase structure grammars (Pollard & Sag, 1994), and lexical-functional grammars (=-=Bresnan, 2001-=-). For learning within such frameworks, methods from undirected graphical models are used; see the work of Johnson (2003) for a description of some recent work. The key difference to LOHMMs is that on... |

271 | RNA sequence analysis using covariance models
- Eddy, Durbin
- 1994
(Show Context)
Citation Context ...n terms of the application domain, although this was not the primary goal of our experiments. There exist also alternative parameter estimation techniques and other models, such as covariance models (=-=Eddy & Durbin, 1994-=-) or pair hidden Markov models (Sakakibara, 2003), that might have been used as well as a basis for comparison. However, as LOHMMs employ (inductive) logic programming principles, it is appropriate to... |

261 | The hierarchical hidden Markov model: Analysis and applications
- Fine, Singer, et al.
- 1998
(Show Context)
Citation Context ...ory (De Raedt & Kersting, 2003, 2004). In the first type of approaches, the underlying idea is to upgrade HMMs and probabilistic grammars to represent more structured state spaces. Hierarchical HMMs (=-=Fine, Singer, & Tishby, 1998-=-), factorial HMMs (Ghahramani & Jordan, 1997), and HMMs based on tree automata (Frasconi, Soda, & Vullo, 2002) decompose the state variables into smaller units. In hierarchical HMMs states themselves ... |

163 | A generalized hidden Markov model for the recognition of human genes
- Kulp, Haussler, et al.
- 1996
(Show Context)
Citation Context ... is conditioned on the i + k-th argument. Markov chains allow one to sample compound terms of finite depth such as s(s(s(0))) and to model e.g. misspelled filenames. This is akin to generalized HMMs (=-=Kulp, Haussler, Reese, & Eeckman, 1996-=-), in which each node may output a finite sequence of symbols rather than a single symbol. Finally, LOHMMs – as introduced in the present paper – specify a probability distribution over all sequences ... |

132 | Stochastic context-free grammars for tRNA modeling
- Sakakibara, Brown, et al.
- 1994
(Show Context)
Citation Context ... of derivation trees, the learning problem of (P)CFGs can be reduced to the problem of an STA. In particular, STA techniques have been adapted to learning tree grammars and (P)CFGs (Sakakibara, 1992; =-=Sakakibara et al., 1994-=-) efficiently. PCFGs have been extended in several ways. Most closely related to LOHMMs are unification-based grammars which have been extensively studied in computational linguistics. Examples are (s... |

115 | Relational Data Mining - Dzeroski, Lavrac - 2001 |

108 | Relational bayesian networks
- Jaeger
- 1997
(Show Context)
Citation Context ...proaches, most attention has been devoted to developing highly expressive formalisms, such as e.g. PCUP (Eisele, 1994), PCLP (Riezler, 1998), SLPs (Muggleton, 1996), PLPs (Ngo & Haddawy, 1997), RBNs (=-=Jaeger, 1997-=-), PRMs (Friedman, Getoor, Koller, & Pfeffer, 1999), PRISM (Sato & Kameya, 2001), BLPs (Kersting & De Raedt, 2001b, 2001a), and DPRMs (Sanghai, Domingos, & Weld, 2003). LOHMMs can be seen as an attemp... |

98 | Parameter learning of logic programs for symbolicstatistical modeling
- Sato, Kameya
(Show Context)
Citation Context ...e formalisms, such as e.g. PCUP (Eisele, 1994), PCLP (Riezler, 1998), SLPs (Muggleton, 1996), PLPs (Ngo & Haddawy, 1997), RBNs (Jaeger, 1997), PRMs (Friedman, Getoor, Koller, & Pfeffer, 1999), PRISM (=-=Sato & Kameya, 2001-=-), BLPs (Kersting & De Raedt, 2001b, 2001a), and DPRMs (Sanghai, Domingos, & Weld, 2003). LOHMMs can be seen as an attempt towards downgrading such highly expressive frameworks. Indeed, applying the m... |

95 | Answering queries from contextsensitive probabilistic knowledge bases
- Ngo, Haddawy
- 1996
(Show Context)
Citation Context ...ee. In the second type of approaches, most attention has been devoted to developing highly expressive formalisms, such as e.g. PCUP (Eisele, 1994), PCLP (Riezler, 1998), SLPs (Muggleton, 1996), PLPs (=-=Ngo & Haddawy, 1997-=-), RBNs (Jaeger, 1997), PRMs (Friedman, Getoor, Koller, & Pfeffer, 1999), PRISM (Sato & Kameya, 2001), BLPs (Kersting & De Raedt, 2001b, 2001a), and DPRMs (Sanghai, Domingos, & Weld, 2003). LOHMMs can... |

94 | Predicting sequences of user actions
- Davison, Hirsh
- 1998
(Show Context)
Citation Context ...lohmms.tex,ls, latex lohmms.tex, . . .Thus, commands are essentially structured. Tasks that have been considered for UNIX command sequences include the prediction of the next command in the sequence (=-=Davison & Hirsh, 1998-=-), the classification of a command sequence in a user category (Korvemaker & Greiner, 2000; Jacobs & Blockeel, 2001), and anomaly detection (Lane, 1999). Traditional HMMs cannot easily deal with this ... |

86 | Relational Markov Models and their Application to Adaptive Web Navigation
- Anderson, Domingos, et al.
(Show Context)
Citation Context .... As our experimental evidence shows, sharing information among abstract states by means of unification can lead to more accurate model estimation. The same holds for relational Markov models (RMMs) (=-=Anderson, Domingos, & Weld, 2002-=-) to which LOHMMs are most closely related. In RMMs, states can be of different types, with each type described by a different set of variables. The domain of each variable can be hierarchically struc... |

78 |
Efficient learning of context-free grammars from positive structural examples. Information and Computation 97:23–60
- Sakakibara
- 1992
(Show Context)
Citation Context ...n of the skeletons of derivation trees, the learning problem of (P)CFGs can be reduced to the problem of an STA. In particular, STA techniques have been adapted to learning tree grammars and (P)CFGs (=-=Sakakibara, 1992-=-; Sakakibara et al., 1994) efficiently. PCFGs have been extended in several ways. Most closely related to LOHMMs are unification-based grammars which have been extensively studied in computational lin... |

77 | Towards combining inductive logic programming with Bayesian networks - Kersting, Raedt |

56 | Probabilistic inductive logic programming - Raedt, Kersting - 2004 |

54 | The ASTRAL Compendium in 2004 - Chandonia - 2004 |

46 | Dynamic probabilistic relational models
- Sanghai, Domingos, et al.
- 2003
(Show Context)
Citation Context ...(Muggleton, 1996), PLPs (Ngo & Haddawy, 1997), RBNs (Jaeger, 1997), PRMs (Friedman, Getoor, Koller, & Pfeffer, 1999), PRISM (Sato & Kameya, 2001), BLPs (Kersting & De Raedt, 2001b, 2001a), and DPRMs (=-=Sanghai, Domingos, & Weld, 2003-=-). LOHMMs can be seen as an attempt towards downgrading such highly expressive frameworks. Indeed, applying the main idea underlying LOHMMs to non-regular probabilistic grammar, i.e., replacing flat s... |

41 | Stochastic inference of regular tree languages
- Carrasco, Oncina, et al.
- 1998
(Show Context)
Citation Context ...to be estimated becomes easily very large, data sparsity is a serious problem. Goodman applied smoothing to overcome the problem. LOHMMs are generally related to (stochastic) tree automata (see e.g., =-=Carrasco, Oncina, and Calera-Rubio, 2001-=-). Reconsider the Unix command sequence mkdir(vt100x),mv(new∗,vt100x),ls(vt100x),cd(vt100x) . Each atom forms a tree, see Figure 7 (a), and, indeed, the whole sequence of atoms also forms a (degenerat... |

40 |
Trainable Grammars for Speech Recognition. Speech Communication Papers for the 97th Meeting of the Acoustical
- Baker
- 1979
(Show Context)
Citation Context ... to parse sentences. For grammars in GNF, pushdown automata are common for parsing. In contrast, the actual computations of the Baum-Welch algorithm for PCFGs, the so called Inside-Outside algorithm (=-=Baker, 1979-=-; Lari & Young, 1990), is usually formulated for grammars in Chomsky normal form 7 . The Inside-Outside algorithm can make use of the efficient CYK algorithm (Hopcroft & Ullman, 1979) for parsing stri... |

36 | Probabilistic feature grammars
- Goodman
- 1997
(Show Context)
Citation Context ...HMMs are unification-based grammars which have been extensively studied in computational linguistics. Examples are (stochastic) attribute-value grammars (Abney, 1997), probabilistic feature grammars (=-=Goodman, 1997-=-), head-driven phrase structure grammars (Pollard & Sag, 1994), and lexical-functional grammars (Bresnan, 2001). For learning within such frameworks, methods from undirected graphical models are used;... |

34 | K.: Probabilistic Logic Learning - Raedt, Kersting - 2003 |

34 |
Relational instance-based learning with lists and terms
- Horváth, Wrobel, et al.
(Show Context)
Citation Context ...ar to PCFGs. To this aim, we conducted experiments on two bioinformatics application domains: protein fold recognition (Kersting, Raiko, Kramer, & De Raedt, 2003) and mRNA signal structure detection (=-=Horváth, Wrobel, & Bohnebeck, 2001-=-). Both application domains are multiclass problems with five different classes each. 1. The sum of probabilities is not the same (0.15 + 0.08 = 0.23 �= 0.25) because of the use of pseudo counts and b... |

33 | Using unix: Collected traces of 168 users - Greenberg - 1988 |

30 | Markov Models for human/computer interface modeling
- Lane, Hidden
- 1999
(Show Context)
Citation Context ... of the next command in the sequence (Davison & Hirsh, 1998), the classification of a command sequence in a user category (Korvemaker & Greiner, 2000; Jacobs & Blockeel, 2001), and anomaly detection (=-=Lane, 1999-=-). Traditional HMMs cannot easily deal with this type of structured sequences. Indeed, applying HMMs requires either 1) ignoring the structure of the commands (i.e., the parameters), or 2) taking all ... |

26 | An mdl method for finding haplotype blocks and for estimating the strength of haplotype block boundaries
- Koivisto, Perola, et al.
- 2003
(Show Context)
Citation Context ... blocks of consecutive helices (resp. strands). Being in a Block of some size s, say 3, the model will remain in the same block for s = 3 time steps. A similar idea has been used to model haplotypes (=-=Koivisto, Perola, Varilo, Hennah, Ekelund, Lukk, Peltonen, Ukkonen, & Mannila, 2002-=-; Koivisto, Kivioja, Mannila, Rastas, & Ukkonen, 2004). In contrast to common HMM block models (Won, Prügel-Bennett, & Krogh, 2004), 439sKersting, De Raedt, & Raiko the transition parameters are share... |

25 | Adaptive Bayesian logic programs - Kersting, Raedt - 2001 |

18 | Pair hidden Markov models on tree structures
- Sakakibara
- 2003
(Show Context)
Citation Context ...s not the primary goal of our experiments. There exist also alternative parameter estimation techniques and other models, such as covariance models (Eddy & Durbin, 1994) or pair hidden Markov models (=-=Sakakibara, 2003-=-), that might have been used as well as a basis for comparison. However, as LOHMMs employ (inductive) logic programming principles, it is appropriate to compare with other systems within this paradigm... |

16 | T.: ’Say EM’ for Selecting Probabilistic Models for Logical Sequences - Kersting, Raiko |

13 | Term Comparisons in First-Order Similarity Measures - Bohnebeck, Horváth, et al. - 1998 |

13 |
Skeletal structural descriptions
- Levy, Joshi
- 1978
(Show Context)
Citation Context ..., 1979) for parsing strings. An alternative to learning PCFGs from strings only is to learn from more structured data such as skeletons, which are derivation trees with the nonterminal nodes removed (=-=Levy & Joshi, 1978-=-). Skeletons are exactly the set of trees accepted by skeletal tree automata (STA). Informally, an STA, when given a tree as input, processes the tree bottom up, assigning a state to each node based o... |

12 |
A Modern Approach to Probability Theory. Probability and its Applications. Birkhäuser Boston Inc
- Fristedt, Gray
- 1997
(Show Context)
Citation Context ...i) � ui P(Zi+1, ui | Zi) and P(Zi+1 | Zi) = � P(Zi+1, ui | Zi) ui where the probability distributions are due to equation (1), it is easy to show that Kolmogorov’s extension theorem (see Bauer, 1991; =-=Fristedt and Gray, 1997-=-) holds. Thus, M specifies a unique probability distribution over �t i=1 (Zi × Ui) for each t > 0 and in the limit t → ∞. � Appendix B. Moore Representations of LOHMMs For HMMs, Moore representations,... |

10 | The e ect of relational background knowledge on learning of protein three-dimensional fold signatures - Turcotte, Muggleton, et al. |

8 | A.: 2002, Hidden Markov models for text categorization in multi-page documents
- Frasconi, Soda, et al.
(Show Context)
Citation Context ...Ms and probabilistic grammars to represent more structured state spaces. Hierarchical HMMs (Fine, Singer, & Tishby, 1998), factorial HMMs (Ghahramani & Jordan, 1997), and HMMs based on tree automata (=-=Frasconi, Soda, & Vullo, 2002-=-) decompose the state variables into smaller units. In hierarchical HMMs states themselves can be HMMs, in factorial HMMs they can be factored into k state variables which depend on one another only t... |

8 | Towards discovering structural signatures of protein folds based on logical hidden markov models - Kersting, Raiko, et al. - 2003 |

6 | The learning shell : Automated macro construction
- Jacobs, Blockeel
- 2001
(Show Context)
Citation Context ...for UNIX command sequences include the prediction of the next command in the sequence (Davison & Hirsh, 1998), the classification of a command sequence in a user category (Korvemaker & Greiner, 2000; =-=Jacobs & Blockeel, 2001-=-), and anomaly detection (Lane, 1999). Traditional HMMs cannot easily deal with this type of structured sequences. Indeed, applying HMMs requires either 1) ignoring the structure of the commands (i.e.... |

6 | Statistical inference and probabilistic modeling for constraint-based NLP
- Riezler
- 1998
(Show Context)
Citation Context ...to predecessors 1, 2, . . . levels down the tree. In the second type of approaches, most attention has been devoted to developing highly expressive formalisms, such as e.g. PCUP (Eisele, 1994), PCLP (=-=Riezler, 1998-=-), SLPs (Muggleton, 1996), PLPs (Ngo & Haddawy, 1997), RBNs (Jaeger, 1997), PRMs (Friedman, Getoor, Koller, & Pfeffer, 1999), PRISM (Sato & Kameya, 2001), BLPs (Kersting & De Raedt, 2001b, 2001a), and... |

4 | Towards Probabilistic Extensions of Contraint-based Grammars - Eisele - 1994 |

4 | Estimationo f probabilities from sparse data for hte language model component of a speech recognizer - Katz - 1987 |

4 | E.: Hidden markov modelling techniques for haplotype analysis
- Koivisto, Kivioja, et al.
- 2004
(Show Context)
Citation Context ...3, the model will remain in the same block for s = 3 time steps. A similar idea has been used to model haplotypes (Koivisto, Perola, Varilo, Hennah, Ekelund, Lukk, Peltonen, Ukkonen, & Mannila, 2002; =-=Koivisto, Kivioja, Mannila, Rastas, & Ukkonen, 2004-=-). In contrast to common HMM block models (Won, Prügel-Bennett, & Krogh, 2004), 439sKersting, De Raedt, & Raiko the transition parameters are shared within each block and one can ensure that the model... |