## Switching State-Space Models (1996)

### Cached

### Download Links

- [vocal.gatsby.ucl.ac.uk]
- [ftp.cs.toronto.edu]
- [www.gatsby.ucl.ac.uk]
- CiteULike

### Other Repositories/Bibliography

Venue: | King’s College Road, Toronto M5S 3H5 |

Citations: | 41 - 2 self |

### BibTeX

@TECHREPORT{Ghahramani96switchingstate-space,

author = {Zoubin Ghahramani and Geoffrey E. Hinton},

title = {Switching State-Space Models},

institution = {King’s College Road, Toronto M5S 3H5},

year = {1996}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce a statistical model for times series data with nonlinear dynamics which iteratively segments the data into regimes with approximately linear dynamics and learns the parameters of each of those regimes. This model combines and generalizes two of the most widely used stochastic time series models---the hidden Markov model and the linear dynamical system---and is related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network model (Jacobs et al., 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact Expectation Maximization (EM) alogithm cannot be applied. However, we present a variational approximation which maximizes a lower bound on the log likelihood and makes use of both the forward--backward recursio...

### Citations

9231 |
Elements of Information Theory
- Cover, Thomas
- 1990
(Show Context)
Citation Context ...X t g (10) X fS t g Z Q(fS t ; X t g) log " P (fS t ; X t ; Y t gj`) Q(fS t ; X t g) # dfX t g = B(Q; `); (11) where ` denotes the parameters of the model and we have made use of Jensen's inequal=-=ity (Cover and Thomas, 1991-=-) to establish (11). Both steps of EM increase the lower bound on the log probability of the observed data. The E-step holds the parameters fixed and sets Q to be the posterior distribution over the h... |

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...Ljung and Soderstrom, 1983). Similar gradient-based methods can be obtained for off-line methods. An alternative method for off-line learning makes use of the Expectation Maximization (EM) algorithm (=-=Dempster et al., 1977-=-). This procedure iterates between a step that fixes the current parameters and computes posterior probabilities over the hidden states given the observations (the E-step), and a step that uses these ... |

7493 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...Smyth, Heckerman and Jordan (1997), the forward--backward algorithm is a special case of exact inference algorithms for more general graphical probabilistic models (Lauritzen and Spiegelhalter, 1988; =-=Pearl, 1988-=-). The same observation holds true for the Kalman smoothing recursions. The other inference problem commonly posed for HMMs is to compute the single most likely sequence of hidden states. The solution... |

1349 |
Local computations with probabilities on graphical structures and their application to expert systems (with discussion
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...(personal communication, 1985) and Smyth, Heckerman and Jordan (1997), the forward--backward algorithm is a special case of exact inference algorithms for more general graphical probabilistic models (=-=Lauritzen and Spiegelhalter, 1988-=-; Pearl, 1988). The same observation holds true for the Kalman smoothing recursions. The other inference problem commonly posed for HMMs is to compute the single most likely sequence of hidden states.... |

1331 | A New Approach to the Economic Analysis of Nonstationary Time
- Hamilton
- 1989
(Show Context)
Citation Context ...inear operating regime to another. There is in face a large literature on models of this kind in econometrics, signal processing, and other fields (Harrison and Stevens, 1976; Chang and Athans, 1978; =-=Hamilton, 1989-=-; Shumway and Stoffer, 1991; Bar-Shalom and Li, 1993). In this paper we extend some of these models to allow for multiple real-valued state vectors, draw connections between these fields and the liter... |

910 | An introduction to hidden markov models
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ...., 1994), and fault detection (Smyth, 1994). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two different forms of the inference problem (=-=Rabiner and Juang, 1986-=-). The first computes the posterior probabilities of the hidden states using a recursive algorithm known as the forward--backward algorithm. The computations in the forward pass are exactly analogous ... |

869 | An Introduction to Variational Methods for Graphical Models - Jordan, Ghahramani, et al. - 1999 |

843 |
Adaptive mixtures of local experts
- Jacobs, Jordan
- 1991
(Show Context)
Citation Context ...e linear dynamical system---and is related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network model (=-=Jacobs et al., 1991-=-) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and t... |

842 |
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The annals of mathematical statistics
- Baum, Soules, et al.
- 1970
(Show Context)
Citation Context ...ich also consists of a forward and backward pass through the model. To learn maximum likelihood parameters for an HMM given sequences of observations, one can use the well-known Baum-Welch algorithm (=-=Baum et al., 1970-=-). This algorithm is a special case of EM that uses the forward--backward algorithm to infer the posterior probabilities of the hidden states in the E-step. The M-step uses expected counts of transiti... |

805 | A view of the EM algorithm that justifies incremental, sparse, and other variants - Neal, Hinton - 1998 |

785 |
Optimal Filtering
- Anderson, Moore
- 1979
(Show Context)
Citation Context ... the resulting posterior is also Gaussian. The special cases of the inference problem for state-space models play a prominent role in the engineering literature: filtering, smoothing, and prediction (=-=Anderson and Moore, 1979-=-; Goodwin and Sin, 1984). The goal of filtering is to compute the probability of the current hidden state X t given the sequence of inputs and outputs up to time t---P (X t jfY g t 1 ; fUg t 1 ). 3 Th... |

764 | Hierarchical mixtures of experts and the em algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ...g to explain the observations. With regard to the literature on neural computation, the model presented in this paper is a generalization of the mixtures of experts architecture (Jacobs et al., 1991; =-=Jordan and Jacobs, 1994-=-). 5 Previous dynamical generalizations of the mixture of experts architecture consider the case in which the gating network has Markovian dynamics (Cacciatore and Nowlan, 1994; Kadirkamanathan and Ka... |

593 | Contour tracking by stochastic propagation of conditional density
- ISARD, BLAKE
- 1996
(Show Context)
Citation Context ...aussian state distributions via a set of samples which are stochastically propagated and reweighted. This approach has been successfully applied to the problem of contour tracking in computer vision (=-=Isard and Blake, 1996-=-; Blake et al., 1995). We have explored elsewhere the use of the EKF in deriving an EM algorithm for general stochastic nonlinear dynamical systems (Ghahramani and Roweis, in preparation). Switching s... |

515 | Factorial hidden markov models
- Ghahramani, Jordan
- 1997
(Show Context)
Citation Context ...on matrix P (S t jS t\Gamma1 ). 6 Note that the state vectors could be concatenated into one large state vector with factorized (blockdiagonal) transition matrices (cf. factorial hidden Markov model; =-=Ghahramani and Jordan, 1997-=-). However, this obscures the decoupled structure of the model. 8 X (M) 2 S 2 Y 2 X (M) 3 S 3 Y 3 X (M) 1 S 1 Y 1 X (1) 2 X (1) 3 X (1) 1 (2) t X S Y X (1) X (M) t Figure 2: a) Graphical model represe... |

436 |
Theory and Practice of Recursive Identification
- Ljung, Soderstrom
- 1983
(Show Context)
Citation Context ...ne approaches to learning. On-line recursive algorithms, favored in real-time adaptive control applications, can be obtained by computing the gradient or the second derivatives of the log likelihood (=-=Ljung and Soderstrom, 1983-=-). Similar gradient-based methods can be obtained for off-line methods. An alternative method for off-line learning makes use of the Expectation Maximization (EM) algorithm (Dempster et al., 1977). Th... |

386 |
On gibbs sampling for state space models
- Carter, Kohn
- 1994
(Show Context)
Citation Context ...h the exponential Gaussian mixture is collapsed down to M Gaussians at each time step. Other authors have used Markov chain Monte Carlo methods for state and parameter estimation in switching models (=-=Carter and Kohn, 1994-=-; Athaide, 1995) and in other more general dynamic probabilistic networks (Dean and Kanazawa, 1989; Kanazawa et al., 1995). One can also model nonlinear processes using nonlinear generalizations of th... |

265 |
An Approach to Time Series Smoothing and Forecasting Using the EM Algorithm
- Shumway, Stoffer
- 1982
(Show Context)
Citation Context ... of the parameters (the M-step). For linear Gaussian state-space models, the E-step is exactly the Kalman smoothing problem as defined above, and the M-step simplifies to a linear regression problem (=-=Shumway and Stoffer, 1982-=-; Digalakis et al., 1993). Details on the EM algorithm for state-space models can be found in Ghahramani and Hinton (1996b), as well as in the original Shumway and Stoffer (1982) paper. It is worth po... |

250 |
Statistical field theory
- Parisi, Shankar
- 1988
(Show Context)
Citation Context ...arameters. A completely factorized approximation is often used in statistical physics, where it provides the basis for simple yet powerful mean field approximations to statistical mechanical systems (=-=Parisi, 1988-=-). Theoretical arguments motivating approximate E-steps were presented by Neal and Hinton (1993). Saul and Jordan (1996) showed that approximate E-steps could be used to maximize a lower bound on the ... |

246 |
Adaptive Filtering Prediction and Control
- Goodwin, Sin
- 1984
(Show Context)
Citation Context ...s also Gaussian. The special cases of the inference problem for state-space models play a prominent role in the engineering literature: filtering, smoothing, and prediction (Anderson and Moore, 1979; =-=Goodwin and Sin, 1984-=-). The goal of filtering is to compute the probability of the current hidden state X t given the sequence of inputs and outputs up to time t---P (X t jfY g t 1 ; fUg t 1 ). 3 The recursive algorithm u... |

240 | Dynamic linear models with Markov-switching - Kim - 1994 |

234 | The EM algorithm for mixtures of factor analyzers
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ...esentations. Before we proceed with the definition of the probabilistic model, it is important to place 5 It can also be seen as a generalization of mixtures of factor analyzers (Hinton et al., 1996; =-=Ghahramani and Hinton, 1996b). 7 the -=-work in this paper in the context of the literature we have just reviewed. "Hybrid models ", state-space with switching and jump-linear systems all assume that there is a single real-valued ... |

202 |
Multitarget-Multisensor Tracking
- Bar-Shalom, Blair
- 2000
(Show Context)
Citation Context ... in face a large literature on models of this kind in econometrics, signal processing, and other fields (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoffer, 1991; =-=Bar-Shalom and Li, 1993-=-). In this paper we extend some of these models to allow for multiple real-valued state vectors, draw connections between these fields and the literature on neural computation, and derive a learning a... |

173 | Probabilistic Independence Networks for Hidden Markov Probability Models - Smyth, Heckerman, et al. - 1996 |

164 |
Hidden Markov models of biological primary sequence information
- Baldi, Chanvin, et al.
- 1994
(Show Context)
Citation Context ... different forms, such as a Gaussian, mixture of Gaussians, or a neural network. HMMs have been applied extensively to problems in speech recognition (Juang and Rabiner, 1991), computational biology (=-=Baldi et al., 1994-=-), and fault detection (Smyth, 1994). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two different forms of the inference problem (Rabiner... |

162 | Parameter estimation for linear dynamical systems
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ...esentations. Before we proceed with the definition of the probabilistic model, it is important to place 5 It can also be seen as a generalization of mixtures of factor analyzers (Hinton et al., 1996; =-=Ghahramani and Hinton, 1996b). 7 the -=-work in this paper in the context of the literature we have just reviewed. "Hybrid models ", state-space with switching and jump-linear systems all assume that there is a single real-valued ... |

156 | Modeling the manifolds of images of handwritten digits
- Hinton, Dayan, et al.
- 1997
(Show Context)
Citation Context ...iscrete HMM-like representations. Before we proceed with the definition of the probabilistic model, it is important to place 5 It can also be seen as a generalization of mixtures of factor analyzers (=-=Hinton et al., 1996; Ghahrama-=-ni and Hinton, 1996b). 7 the work in this paper in the context of the literature we have just reviewed. "Hybrid models ", state-space with switching and jump-linear systems all assume that t... |

153 | Stochastic simulation algorithms for dynamic probabilistic networks
- Kanazawa, Koller, et al.
- 1995
(Show Context)
Citation Context ...in Monte Carlo methods for state and parameter estimation in switching models (Carter and Kohn, 1994; Athaide, 1995) and in other more general dynamic probabilistic networks (Dean and Kanazawa, 1989; =-=Kanazawa et al., 1995-=-). One can also model nonlinear processes using nonlinear generalizations of the state-space model which do not explicitly representing a switching state. The conditional mean and 6 variance of the hi... |

144 |
An Introduction to Latent Variable Models
- Everitt
- 1984
(Show Context)
Citation Context ... Gaussian state-space model is a generalization of a statistical method known as factor analysis. Factor analysis models high dimensional data through a smaller number of latent variables or factors (=-=Everitt, 1984-=-). The model relating the factors to the observations is exactly as specified by equation (3): X t is a Gaussian distributed vector of factor values; Y t is the observation vector; C is known as the f... |

112 | An input output HMM architecture
- Bengio, Frasconi
- 1995
(Show Context)
Citation Context ...d). The HMM can be augmented to allow for input variables, such that it models the conditional distribution of sequences of output observations given sequences of inputs (Cacciatore and Nowlan, 1994; =-=Bengio and Frasconi, 1995-=-; Meila and Jordan, 1996). The approach used in Bengio and Frasconi's Input Output HMMs (IOHMMs) suggests modeling P (S t jS t\Gamma1 ; U t ), where U t is the input, as M separate neural networks, on... |

104 | Exploiting tractable substructures in intractable networks
- Saul, Jordan
- 1996
(Show Context)
Citation Context ... multiple real-valued state vectors. 6 We present a learning algorithm for all of the parameters of the model, including the Markov switching parameters. Using a structured variational approximation (=-=Saul and Jordan, 1996-=-), we show that this algorithm maximizes a strict lower bound on the log likelihood of the data, rather than a heuristically motivated pseudo-likelihood. The resulting algorithm has a simple and intui... |

100 | Learning to track the visual motion of contours
- Blake, Isard, et al.
(Show Context)
Citation Context ...ions via a set of samples which are stochastically propagated and reweighted. This approach has been successfully applied to the problem of contour tracking in computer vision (Isard and Blake, 1996; =-=Blake et al., 1995-=-). We have explored elsewhere the use of the EKF in deriving an EM algorithm for general stochastic nonlinear dynamical systems (Ghahramani and Roweis, in preparation). Switching state-space models ca... |

98 | Hidden Markov Models for Speech Recognition
- Juang, Rabiner
- 1991
(Show Context)
Citation Context ...tion vector, P (Y t jS t ) can be modeled in many different forms, such as a Gaussian, mixture of Gaussians, or a neural network. HMMs have been applied extensively to problems in speech recognition (=-=Juang and Rabiner, 1991-=-), computational biology (Baldi et al., 1994), and fault detection (Smyth, 1994). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two diffe... |

87 | ML estimation of a stochastic linear systems with the EM algorithm and its application to speech recognition
- Digalakis, Rohlicek, et al.
- 1993
(Show Context)
Citation Context ...tep). For linear Gaussian state-space models, the E-step is exactly the Kalman smoothing problem as defined above, and the M-step simplifies to a linear regression problem (Shumway and Stoffer, 1982; =-=Digalakis et al., 1993-=-). Details on the EM algorithm for state-space models can be found in Ghahramani and Hinton (1996b), as well as in the original Shumway and Stoffer (1982) paper. It is worth pointing out that the line... |

72 |
Dynamic linear models with switching
- SHUMWAY, STOFFER
- 1991
(Show Context)
Citation Context ...regime to another. There is in face a large literature on models of this kind in econometrics, signal processing, and other fields (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; =-=Shumway and Stoffer, 1991-=-; Bar-Shalom and Li, 1993). In this paper we extend some of these models to allow for multiple real-valued state vectors, draw connections between these fields and the literature on neural computation... |

63 |
Bayesian Forecasting (with discussion
- Harrison, Stevens
- 1976
(Show Context)
Citation Context ...amics can transition in a discrete manner from one linear operating regime to another. There is in face a large literature on models of this kind in econometrics, signal processing, and other fields (=-=Harrison and Stevens, 1976-=-; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoffer, 1991; Bar-Shalom and Li, 1993). In this paper we extend some of these models to allow for multiple real-valued state vectors, draw connec... |

59 | Mixtures of controllers for jump linear and non-linear plants
- Cacciatore, Nowlan
- 1994
(Show Context)
Citation Context ...tions are Gaussian distributed). The HMM can be augmented to allow for input variables, such that it models the conditional distribution of sequences of output observations given sequences of inputs (=-=Cacciatore and Nowlan, 1994-=-; Bengio and Frasconi, 1995; Meila and Jordan, 1996). The approach used in Bengio and Frasconi's Input Output HMMs (IOHMMs) suggests modeling P (S t jS t\Gamma1 ; U t ), where U t is the input, as M s... |

58 |
Solutions to the linear smoothing problem
- Rauch
- 1963
(Show Context)
Citation Context ...ard direction to compute the probability of X t given fY g t 1 and fUg t 1 . A similar set of backward recursions from T to t complete the computation by accounting for the observations after time t (=-=Rauch, 1963-=-). We will refer to the combined forward and backward recursions for smoothing as the Kalman smoothing recursions (also known as the RTS or Rauch-TungStreibel smoother ). Finally, the goal of predicti... |

51 | On state estimation in switching environments - Ackerson, Fu - 1970 |

42 |
Hidden Markov Models for Fault Detection in Dynamic Systems
- Smyth
- 1994
(Show Context)
Citation Context ...ure of Gaussians, or a neural network. HMMs have been applied extensively to problems in speech recognition (Juang and Rabiner, 1991), computational biology (Baldi et al., 1994), and fault detection (=-=Smyth, 1994-=-). Given an HMM with known parameters and a sequence of observations, two algorithms are commonly used to solve two different forms of the inference problem (Rabiner and Juang, 1986). The first comput... |

40 | Forecasting probability densities by using hidden Markov models with mixed states - Fraser, Dimitriadis - 1993 |

33 | A View of the EM Algorithm that Justi es Incremental, Sparse, and Other Variants - Neal, Hinton - 1998 |

32 | New Results in Linear Filtering and Prediction - Kalman, Bucy - 1961 |

27 |
State estimation for discrete systems with switching parameters
- Chang, Athans
- 1978
(Show Context)
Citation Context ...screte manner from one linear operating regime to another. There is in face a large literature on models of this kind in econometrics, signal processing, and other fields (Harrison and Stevens, 1976; =-=Chang and Athans, 1978-=-; Hamilton, 1989; Shumway and Stoffer, 1991; Bar-Shalom and Li, 1993). In this paper we extend some of these models to allow for multiple real-valued state vectors, draw connections between these fiel... |

23 |
Deterministic Annealing Variant of the EM Algorithm
- Ueda, Nakano
- 1995
(Show Context)
Citation Context ...ture parameter, which is initialized to a large value and gradually reduced to 1. The above equations maximize a modified form of the bound B in (11), where the entropy of Q has been multiplied by T (=-=Ueda and Nakano, 1995-=-). 5 Simulations 5.1 Experiment 1: Variational Segmentation and Deterministic Annealing The goal of this experiment was to assess the quality of solutions found by the variational inference algorithm,... |

20 | A Mixtureof-Experts Framework for Adaptive Kalman Filtering
- Chaer, Bishop, et al.
- 1997
(Show Context)
Citation Context ...linear dynamics of SSMs (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoffer, 1991; Bar-Shalom and Li, 1993; Deng, 1993; Kadirkamanathan and Kadirkamanathan, 1996; =-=Chaer et al., 1997-=-). These models are known alternately as hybrid models, state-space models with switching, and jump-linear systems. We briefly review some of the main results in this literature including some recent ... |

19 | Learning fine motion by markov mixture of experts
- Meilă, Jordan
- 1996
(Show Context)
Citation Context ...d to allow for input variables, such that it models the conditional distribution of sequences of output observations given sequences of inputs (Cacciatore and Nowlan, 1994; Bengio and Frasconi, 1995; =-=Meila and Jordan, 1996-=-). The approach used in Bengio and Frasconi's Input Output HMMs (IOHMMs) suggests modeling P (S t jS t\Gamma1 ; U t ), where U t is the input, as M separate neural networks, one for each setting of S ... |

15 | Multi-channel physiological data: Description and analysis - Rigney, Goldberger, et al. - 1993 |

12 |
Theory and practice ofrecursive identi cation
- Ljung, Soderstrom
- 1986
(Show Context)
Citation Context ...ne approaches to learning. On-line recursive algorithms, favored in real-time adaptive control applications, can be obtained by computing the gradient or the second derivatives of the log likelihood (=-=Ljung and Soderstrom, 1983-=-). Similar gradient-based methods can be obtained for o -line methods. An alternative method for o -line learning makes use of the Expectation Maximization (EM) algorithm (Dempster et al., 1977). This... |

11 |
A stochastic model of speech incorporating hierarchical nonstationarity
- Deng
- 1993
(Show Context)
Citation Context ...ine the discrete transition structure of HMMs with the linear dynamics of SSMs (Harrison and Stevens, 1976; Chang and Athans, 1978; Hamilton, 1989; Shumway and Stoffer, 1991; Bar-Shalom and Li, 1993; =-=Deng, 1993-=-; Kadirkamanathan and Kadirkamanathan, 1996; Chaer et al., 1997). These models are known alternately as hybrid models, state-space models with switching, and jump-linear systems. We briefly review som... |

11 | On Structured Variational Approximations
- Ghahramani
- 1997
(Show Context)
Citation Context ...nce, the zeros of the derivatives of KL with respect to the variational parameters can be obtained simply by equating derivatives of hHi and hHQ i with respect to corresponding sufficient statistics (=-=Ghahramani, 1997-=-): @hHQ \Gamma Hi @hS (m) t i = 0 (42) @hHQ \Gamma Hi @hX (m) t i = 0 (43) @hHQ \Gamma Hi @hP (m) t i = 0 (44) where P (m) t = hX (m) t X (m) t 0 i \Gamma hX (m) t ihX (m) t i 0 is the covariance of X... |