## Structural Learning of Dynamic Bayesian Networks in Speech Recognition (2001)

Citations: | 8 - 4 self |

### BibTeX

@MISC{Deviren01structurallearning,

author = {Murat Deviren},

title = {Structural Learning of Dynamic Bayesian Networks in Speech Recognition},

year = {2001}

}

### OpenURL

### Abstract

this paper, X i denotes a continuous or discrete random variable. Values of the random variable will be indicated by lower case letters as in x i . For a discrete variable that takes r values, x i denote a speci c assignment for 1 k r. A set of variables is denoted in boldface letters X = fX 1 ; : : : ; Xn g

### Citations

1079 | Bayesian method for the induction of probabilistic networks from data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...ctural learning is updating the belief on each structure by computing the posterior P (SjD). Hence, Bayesian scoring metric is the posterior structure pdf, or equivalently P (S; D). For discrete BNs, =-=[7]-=- derived a Bayesian scoring metric based on the following assumptions. ffl Assumption 1 : Multinomial sample The network structure hypothesis is true if and only if the database D can be partitioned i... |

857 | A tutorial on learning with bayesian networks
- Heckerman
- 1999
(Show Context)
Citation Context ...The inference problem in general BNs is still a hot research topic. Several researchers have developed exact and approximate inference algorithms for different distributions. A review can be found in =-=[16]-=-. The most commonly used exact inference algorithm for discrete BNs is known as the JLO algorithm [19]. The algorithm can also be used for continuous Gaussian BNs [8]. In the following we will describ... |

298 | Bayesian networks
- Heckerman, Wellman
- 1995
(Show Context)
Citation Context ...in. [18] defines a method to initialize parameter priors for BNs of discrete variables with multinomial distributions based on a set of assumptions. A similar method is extended to continuous case in =-=[17]-=-. 4.3 EM algorithm for parameter learning in the case of incomplete data In the case of incomplete data, the factorization property of the BN cannot be used to compute the data likelihood. The likelih... |

244 |
Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs
- Tarjan, Yannakakis
- 1984
(Show Context)
Citation Context ...lgorithm to obtain a triangulation of a given undirected graph [5]. This algorithm can be implemented in linear time O(N + l), where N is the number of nodes and l is the number of links in the graph =-=[23]-=- . Given an initial node the algorithm constructs a triangulated graph and returns an ordering of the nodes. Algorithm: Triangulate ffl Input : An undirected graph G = (X; L) with N nodes and an initi... |

219 | Learning the structure of dynamic probabilistic networks
- Friedman, Murphy, et al.
- 1998
(Show Context)
Citation Context ...i it denotes the parents of Xi[t]. 6.2 Dependency range and structural inductions In the BNs literature, DBNs are defined using the assumption that X[t] is Markovian and stationary [10] [9] [20] [15] =-=[13]-=-. The time dependency properties of Xi[t] determines the parents, and hence the network structure at time t. If the process is stationary, then the network structure is repeating for each time instant... |

218 | The Bayesian structural EM algorithm
- Friedman
- 1998
(Show Context)
Citation Context ..., the scoring metrics described above, are not decomposable and cannot be evaluated directly. The structural EM algorithm evaluates the expected score of a network based on some initial network [11], =-=[12]-=-. Q(S 0 ; \Thetas0 jS; \Theta ) = ES;\Theta fScore(S; D)g (69) The expectation is taken with respect to P (XhjD; S; \Theta ). The computation of the expected score generally requires inference within ... |

215 |
The EM-algorithm for graphical association models with missing data
- LAURITZEN
- 1995
(Show Context)
Citation Context ... The expectation is taken with respect to P (XhjS; \Theta ). The EM algorithm is based on forming the conditional expectation of the log-likelihood function for complete data, given the observed data =-=[22]-=-. Q(\Thetas0 j\Theta ) = E\Theta flog P (Dl; XhjS; \Thetas0 )g (44) The algorithm starts with an initialization of the parameters \Thetasand alternates between E-step and M-step. At the E-step, \Theta... |

188 | Learning Bayesian Belief Networks : An Approach Based on
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...L says that, the best model of a collection of data items is the model that minimizes the sum of ffl the length of the encoding of the model ffl the length of the encoding of the data given the model =-=[21]-=- uses the MDL principle to score a BN. The first term is related to the complexity of the model. The simpler the model the less the encoding length will be. Therefore the MDL scoring penalizes complex... |

182 | Theory refinement on Bayesian networks
- Buntine
- 1991
(Show Context)
Citation Context ...l structure. In [18] the deviation is represented as the number of different arcs between two networks. For a single variable this is denoted as ffii. P (S) = c \DeltasnY i=1 ^ ffii 0 ! ^ ! 1 (67) In =-=[3]-=-, Buntine considers different penalty factors for different arcs. 17s5.3 Searching structure space Searching the structure space for high scored structures is the next issue in structure learning. It ... |

175 |
Expert Systems and Probabilistic Network Models
- Castillo, GutiĆ©rrez, et al.
- 1997
(Show Context)
Citation Context ...ion with a minimum number of additional edges. However, this problem is NP-complete [24]. We use the Maximum Cardinality Search Fill-In algorithm to obtain a triangulation of a given undirected graph =-=[5]-=-. This algorithm can be implemented in linear time O(N + l), where N is the number of nodes and l is the number of links in the graph [23] . Given an initial node the algorithm constructs a triangulat... |

171 | A guide to the literature on learning probabilistic networks from data
- Buntine
- 1996
(Show Context)
Citation Context ...max \Thetasp(DjS; \Theta ) (21) ML estimation is a powerful method especially for large data sets. For small data sets, there are two basic problems about ML estimation, sparse data, and over-fitting =-=[4]-=-, [2]. ffl Sparse data : Consider a discrete BN, and a dataset D. If there are no instances of a specific realization of a variable in D, then the sample likelihood does not exist. Hence, ML estimate ... |

157 | Adaptive probabilistic networks with hidden variables
- Binder, Koller, et al.
- 1997
(Show Context)
Citation Context ...ures is defined by the triples (^; o/p; o/f ) for, 1 ^ ^ ^ ^max; 0 ^ o/p ^ o/pmax ; 0 ^ o/f ^ o/fmax . (^max; o/pmax ; o/fmax) is an upper limit which defines the size of the search class. 23sXo[1] Xo=-=[2]-=- Xo[3] Xh[1] Xh[2] Xh[3] Xh[4] Xo[4] Figure 6: DBN structure with (^; o/p; o/f ) = (2; 1; 1), T = 4 7.2 Learning DBNs for speech recognition Now that we have defined the search class, the learning pro... |

144 |
Bayesian updating in recursive graphical models by local computation
- Jensen, Lauritzen, et al.
- 1990
(Show Context)
Citation Context ... exact and approximate inference algorithms for different distributions. A review can be found in [16]. The most commonly used exact inference algorithm for discrete BNs is known as the JLO algorithm =-=[19]-=-. The algorithm can also be used for continuous Gaussian BNs [8]. In the following we will describe the discrete version of the algorithm. In the task of speech recognition we will consider hybrid net... |

127 | Learning dynamic bayesian networks
- Ghahramani
(Show Context)
Citation Context ...re \Pi it denotes the parents of Xi[t]. 6.2 Dependency range and structural inductions In the BNs literature, DBNs are defined using the assumption that X[t] is Markovian and stationary [10] [9] [20] =-=[15]-=- [13]. The time dependency properties of Xi[t] determines the parents, and hence the network structure at time t. If the process is stationary, then the network structure is repeating for each time in... |

120 | Learning belief networks in the presence of missing values and hidden variables
- Friedman
- 1997
(Show Context)
Citation Context ... Hence, the scoring metrics described above, are not decomposable and cannot be evaluated directly. The structural EM algorithm evaluates the expected score of a network based on some initial network =-=[11]-=-, [12]. Q(S 0 ; \Thetas0 jS; \Theta ) = ES;\Theta fScore(S; D)g (69) The expectation is taken with respect to P (XhjD; S; \Theta ). The computation of the expected score generally requires inference w... |

109 |
Probabilistic temporal reasoning
- Dean, Kanazawa
- 1988
(Show Context)
Citation Context ...t) (70) 19swhere \Pi it denotes the parents of Xi[t]. 6.2 Dependency range and structural inductions In the BNs literature, DBNs are defined using the assumption that X[t] is Markovian and stationary =-=[10]-=- [9] [20] [15] [13]. The time dependency properties of Xi[t] determines the parents, and hence the network structure at time t. If the process is stationary, then the network structure is repeating fo... |

109 | Speech recognition with dynamic Bayesian networks
- Zweig
- 1998
(Show Context)
Citation Context ... and go to Step 3. Now that the model is decomposable, the last step is to determine the cliques and to construct the tree. In constructing the clique tree we use the linear time algorithm defined in =-=[25]-=-. The algorithm proceeds in two steps. The first step is tree formation, where a clique is defined for each node with its lower numbered neighbors. Then, the cliques are linked to satisfy the RIP prop... |

73 |
Learning Bayesian networks: Search methods and experimental results
- Chickering, Geiger, et al.
- 1995
(Show Context)
Citation Context ...s different penalty factors for different arcs. 17s5.3 Searching structure space Searching the structure space for high scored structures is the next issue in structure learning. It has been shown in =-=[6]-=- that finding the structure with maximum scoring is NP-hard. Therefore for arbitrary structures, heuristic search algorithms are used. ffl K2 search : Initialize with an ordering of nodes such that th... |

53 | Natural Statistical Models for Automatic Speech Recognition
- Bilmes
- 1999
(Show Context)
Citation Context ...g. If we consider T time slices of variables, the DBN can be considered as a (static) BN with T \Thetasn variables. Using the factorization property of BNs, the joint probability density of X T 1 = fX=-=[1]-=-; : : : ; X[T ]g can be written as : P (X[0]; : : : ; X[T ]) = TY t=1 nY i=1 P (Xi[t]j\Pi it) (70) 19swhere \Pi it denotes the parents of Xi[t]. 6.2 Dependency range and structural inductions In the B... |

26 | D (2002) Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Annals of statistics 30
- Geiger, Heckerman
(Show Context)
Citation Context ... so that the functional form remains the same in the presence of data (i.e. the prior and posterior pdfs have the same functional form). Construction of parameter priors for DAG modes is discussed in =-=[14]-=-. The initialization of the priors is another issue, which requires prior knowledge over the domain. [18] defines a method to initialize parameter priors for BNs of discrete variables with multinomial... |

20 | Uncertain reasoning and forecasting
- Dagum, Galper, et al.
- 1995
(Show Context)
Citation Context ...0) 19swhere \Pi it denotes the parents of Xi[t]. 6.2 Dependency range and structural inductions In the BNs literature, DBNs are defined using the assumption that X[t] is Markovian and stationary [10] =-=[9]-=- [20] [15] [13]. The time dependency properties of Xi[t] determines the parents, and hence the network structure at time t. If the process is stationary, then the network structure is repeating for ea... |

9 |
Learning Bayesian networks: The combination of knowledge and statistical data
- Chickering
- 1994
(Show Context)
Citation Context ...have the same functional form). Construction of parameter priors for DAG modes is discussed in [14]. The initialization of the priors is another issue, which requires prior knowledge over the domain. =-=[18]-=- defines a method to initialize parameter priors for BNs of discrete variables with multinomial distributions based on a set of assumptions. A similar method is extended to continuous case in [17]. 4.... |

9 |
Computing the minimal fill-in is NP-Complete
- Yannakakis
- 1981
(Show Context)
Citation Context ...h greater than four. Hence, the triangulation process is not unique. In general it is desired to obtain a triangulation with a minimum number of additional edges. However, this problem is NP-complete =-=[24]-=-. We use the Maximum Cardinality Search Fill-In algorithm to obtain a triangulation of a given undirected graph [5]. This algorithm can be implemented in linear time O(N + l), where N is the number of... |

2 |
A computational scheme for reasoning in dynamic probabilistic networks
- Kjaerulf
- 1992
(Show Context)
Citation Context ...9swhere \Pi it denotes the parents of Xi[t]. 6.2 Dependency range and structural inductions In the BNs literature, DBNs are defined using the assumption that X[t] is Markovian and stationary [10] [9] =-=[20]-=- [15] [13]. The time dependency properties of Xi[t] determines the parents, and hence the network structure at time t. If the process is stationary, then the network structure is repeating for each ti... |