## Switching Kalman Filters (1998)

Citations: | 61 - 2 self |

### BibTeX

@TECHREPORT{Murphy98switchingkalman,

author = {Kevin P. Murphy},

title = {Switching Kalman Filters},

institution = {},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

We show how many different variants of Switching Kalman Filter models can be represented in a unified way, leading to a single, general-purpose inference algorithm. We then show how to find approximate Maximum Likelihood Estimates of the parameters using the EM algorithm, extending previous results on learning using EM in the non-switching case [DRO93, GH96a] and in the switching, but fully observed, case [Ham90]. 1 Introduction Dynamical systems are often assumed to be linear and subject to Gaussian noise. This model, called the Linear Dynamical System (LDS) model, can be defined as x t = A t x t\Gamma1 + v t y t = C t x t +w t where x t is the hidden state variable at time t, y t is the observation at time t, and v t ¸ N(0; Q t ) and w t ¸ N(0; R t ) are independent Gaussian noise sources. Typically the parameters of the model \Theta = f(A t ; C t ; Q t ; R t )g are assumed to be time-invariant, so that they can be estimated from data using e.g., EM [GH96a]. One of the main adva...

### Citations

7436 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...Switching observations Switching dynamics switching observations Y1 Y2 X1 X2 Y1 Y2 S1 S2 with factored state S1 S2 Y1 Y2 Figure 1: Some switching Kalman filter models represented as Bayesian networks =-=[Pea88]-=-. Square nodes are discrete, oval ones are Gaussian. Shaded nodes are observed, clear nodes are hidden. If S t were observed, we would know when to apply each submodel (i.e., the segmentation would be... |

5289 | Neural Networks for Pattern Recognition - Bishop - 1995 |

4566 | A tutorial on hidden Markov models and selected applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...subject to non-Gaussian noise. One approach to this problem is to discretize the (hidden) state variables, resulting in Dynamic Bayesian Networks [DW91, Gha97], of which the Hidden Markov Model (HMM) =-=[Rab89]-=- is the simplest example. However, the resulting system will in general have a belief state that is exponential in the number of hidden state variables, resulting in intractable inference. In addition... |

1154 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...ized Pseudo Bayesian algorithm of order r (GPB(r)) (see e.g., [BSL93, Kim94]). When r = 1, we approximate a mixture of Gaussians with a single Gaussian using moment matching; this can be shown (e.g., =-=[Lau96]) to be th-=-e best (in the KL sense) single Gaussian approximation. When r = 2, we "collapse" Gaussians which differ in their history two steps ago; in general, these will be more similar than Gaussians... |

805 |
Optimal Statistical Decisions
- DeGroot
- 1970
(Show Context)
Citation Context ... A well-known problem with mixtures-of-Gaussians models, even in the non-dynamic case, is that the covariance matrix can easily become singular. Hamilton [Ham90, Ham91] suggests using a Wishart prior =-=[DeG70]-=- to regularize the problem. In particular, suppose the prior is Q \Gamma1 isW (ff i ;si ), where ff i is our equivalent sample size for the precision matrixsi . Then the MAP estimate of Q i is given b... |

799 | A view of the EM algorithm that justifies incremental, sparse and other variants
- Neal, Hinton
- 1999
(Show Context)
Citation Context ...n computesQ as above, and then set the off-diagonal entries to 0. 4 We have written the formula for \Sigma i in the usual form, and also in a form which is easier to compute in an incremental fashion =-=[NH98]-=- from the sufficient statistics. Remember thatsx 1 and W i 1 are functions of `. 8 To achieve parameter tieing, we pool the expected sufficient statistics for each parameter in the equivalence class. ... |

771 |
Tracking and data association
- Bar-Shalom, Fortmann
- 1988
(Show Context)
Citation Context ...utput variable or as choosing a permutation matrix C t to apply, to model the fact that we are uncertain about which process causes which observation [SS91]; this is called data association ambiguity =-=[BSF88]-=-. 2 Of course, we can make both the dynamics and the observation model dependent on S t (or on two separate Markov chains). This is the most general case that we will assume for the rest of this paper... |

398 | Evaluating influence diagrams - Shachter - 1986 |

321 | Planning and control - Dean, Wellman - 1991 |

313 | Bayesian Forecasting and Dynamic Models - West, Harrison - 1997 |

269 | Tractable inference for complex stochastic processes - Boyen, Koller - 1998 |

234 |
Dynamic Linear Models with Markov-Switching
- Kim
- 1994
(Show Context)
Citation Context ... ; U jjk t ) x ()k tjT = E [X t jy 1:T ; S t+1 = k] = X j x (j)k tjT U jjk t V t+1;tjT = CollapseCross(x k t+1jT ; x ()k tjT ; V k t+1;tjT ; M t+1jT (k)) The line marked * is a standard approximation =-=[Kim94]-=-, derived as follows. Pr(S t = jjS t+1 = k; y 1:T )sPr(S t = jjS t+1 = k; y 1:t ) = Pr(S t = jjy 1:t ) Pr(S t+1 = kjS t = j) Pr(S t+1 = kjy 1:t ) where the approximation arised because S t is not cond... |

232 |
Analysis of Time Series Subject to Changes in Regime
- Hamilton
- 1990
(Show Context)
Citation Context ... Likelihood Estimates of the parameters using the EM algorithm, extending previous results on learning using EM in the non-switching case [DRO93, GH96a] and in the switching, but fully observed, case =-=[Ham90]-=-. 1 Introduction Dynamical systems are often assumed to be linear and subject to Gaussian noise. This model, called the Linear Dynamical System (LDS) model, can be defined as x t = A t x t\Gamma1 + v ... |

172 | Probabilistic independence networks for hidden Markov probability models - Smyth, Heckerman, et al. - 1997 |

163 | A generalized hidden Markov model for the recognition of human genes - Kulp, Haussler, et al. - 1996 |

162 | Parameter estimation for linear dynamical systems
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ... independent Gaussian noise sources. Typically the parameters of the model \Theta = f(A t ; C t ; Q t ; R t )g are assumed to be time-invariant, so that they can be estimated from data using e.g., EM =-=[GH96a]-=-. One of the main advantages of this model is that there is an efficient algorithm for performing inference (i.e., computing the belief state P (X t jy 1:t )), the well-known Kalman filter, and its ge... |

153 |
A survey of design methods for failure detection systems
- Willsky
- 1976
(Show Context)
Citation Context ...e prior probability of S t reflects how often we expect outliers to occur. This is a widely used technique for making linear regression more robust, see e.g., [PG88], and for modelling sensor failure =-=[Wil76]-=-. Recently, Ghahrhamani et al. [GH96b] have proposed the model shown in Figure 1(c). This also has switching observations, but the interpretation is different. The switch variable in this case can be ... |

150 | On convergence properties of the EM algorithm for Gaussian mixtures - Xu, Jordan - 1996 |

140 | Learning dynamic Bayesian networks - Ghahramani - 1998 |

97 |
Deterministic annealing EM algorithm
- Ueda, Nakano
- 1998
(Show Context)
Citation Context ...torious for getting stuck in local minima. This is especially common in models of the kind we are considering, which have M T possible segmentations. One solution is to use deterministic annealing EM =-=[UN98]-=-, as suggested in [GH96b]. In DAEM, we replace the posterior Pr(Hjo) = Pr(H; o) P H Pr(H; o) (where H are the hidden variables and o the observed values) with f(Hjo) = Pr(H; o) fi P H Pr(H; o) fi wher... |

87 | ML estimation of a stochastic linear systems with the EM algorithm and its application to speech recognition
- Digalakis, Rohlicek, et al.
- 1993
(Show Context)
Citation Context ... V i(j) tjt ; W ijj t ) The definitions of the filter, smoother and collapse operators are given in Appendix A; for a derivation, see e.g., [BSL93]; for the derivation of the cross-variance term, see =-=[DRO93]-=-; the deriviation of the mode update equation is as follows. Pr(S t\Gamma1 = i; S t = jjy t ; y 1:t\Gamma1 ) = 1 c Pr(S t\Gamma1 = i; S t = j; y t jy 1:t\Gamma1 ) = 1 c Pr(y t jS t\Gamma1 = i; S t = j... |

79 | State space modeling of time series - Aoki - 1987 |

77 | Estimation and tracking: principles, techniques, and software - Bar-Shalom, Li - 1998 |

71 |
Dynamic linear models with switching
- Shumway, Stoffer
- 1991
(Show Context)
Citation Context ... one of the sub-processes to pass through to the output variable or as choosing a permutation matrix C t to apply, to model the fact that we are uncertain about which process causes which observation =-=[SS91]-=-; this is called data association ambiguity [BSF88]. 2 Of course, we can make both the dynamics and the observation model dependent on S t (or on two separate Markov chains). This is the most general ... |

60 | Markov Chain Monte Carlo in Conditionally Gaussian State Space Models - Cater, Kohn - 1996 |

41 | Switching state-space models
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ...w often we expect outliers to occur. This is a widely used technique for making linear regression more robust, see e.g., [PG88], and for modelling sensor failure [Wil76]. Recently, Ghahrhamani et al. =-=[GH96b] have prop-=-osed the model shown in Figure 1(c). This also has switching observations, but the interpretation is different. The switch variable in this case can be thought of as "selecting" one of the s... |

39 | Approximate learning of dynamic models - Boyen, Koller - 1998 |

33 | quasi-Bayesian approach to estimating parameters for mixtures of normal distributions - Hamilton - 1991 |

33 |
A View of the EM Algorithm that Justi es Incremental, Sparse, and Other Variants
- Neal, Hinton
- 1998
(Show Context)
Citation Context ...Y1 Y2 S1 S2 Switching Auto Regression Switching dynamics Switching observations switching observations with factored state Figure 1: Some switching Kalman lter models represented as Bayesian networks =-=[Pea88]-=-. Square nodes are discrete, oval ones are Gaussian. Shaded nodes are observed, clear nodes are hidden. If St were observed, we would know when to apply each submodel (i.e., the segmentation would be ... |

28 | State-space models - Hamilton - 1994 |

26 |
Nonlinear Time Series Analysis: A dynamical systems Approach
- Tong
- 1990
(Show Context)
Citation Context ...stant. We can also imagine adding an arc from X t to S t+1 without unduly increasing the complexity (since x t is observed), cf., [Bil98]. This is somewhat related to Threshold Auto Regressive models =-=[Ton90]-=-. SAR models are computationally much simpler than SKFs (no approximations are necessary to do inference, as we will see), since the only hidden node is discrete. They can therefore be used to bootstr... |

24 | Bayesian estimation of switching ARMA models - Billio, Monfort, et al. - 1999 |

22 | Learning mixtures of bayesian networks
- Thiesson, Meek, et al.
- 1997
(Show Context)
Citation Context ...in the weight matrix correspond to missing arcs in the graph [?]. If these matrices are conditioned on S t , we are effectively using a mixture of Gaussian models, each with different structure, c.f. =-=[TMCH98]-=- for the static case. If we have multiple discrete variables (e.g., each sub-process has piecewise linear dynamics), and the size of each discrete state variable is jS t j = M , then the size of the c... |

18 | Data-driven extensions to HMM statistical dependencies
- Bilmes
- 1998
(Show Context)
Citation Context ...nd the number of measurements we receive at each time step, is not constant. We can also imagine adding an arc from X t to S t+1 without unduly increasing the complexity (since x t is observed), cf., =-=[Bil98]-=-. This is somewhat related to Threshold Auto Regressive models [Ton90]. SAR models are computationally much simpler than SKFs (no approximations are necessary to do inference, as we will see), since t... |

7 |
Bayesian Approach to Robustifying the Kalman Filter
- Pena, Guttman
- 1988
(Show Context)
Citation Context ...ariance (e.g., approximately uniform). The prior probability of S t reflects how often we expect outliers to occur. This is a widely used technique for making linear regression more robust, see e.g., =-=[PG88]-=-, and for modelling sensor failure [Wil76]. Recently, Ghahrhamani et al. [GH96b] have proposed the model shown in Figure 1(c). This also has switching observations, but the interpretation is different... |

6 | Modeling volatility using state space models - Timmer, Weigend - 1997 |

4 |
Bayesian detection and estimation of jumps in linear systems
- Smith, Makov
- 1980
(Show Context)
Citation Context ...he belief state p(X t jy 1:t ) will be a mixture of M t Gaussians, one for each possible model history S 1 ; : : : ; S t . There are several general approaches to dealing with this exponential growth =-=[SM80]-=-: ffl Collapsing: approximate the mixture of M t Gaussians with a mixture of r Gaussians. This is called the Generalized Pseudo Bayesian algorithm of order r (GPB(r)) (see e.g., [BSL93, Kim94]). When ... |