## Modeling changing dependency structure in multivariate time series (2007)

### Cached

### Download Links

- [www.cs.ubc.ca]
- [www.ai.mit.edu]
- [people.cs.ubc.ca]
- [imls.engr.oregonstate.edu]
- [www.machinelearning.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In International Conference in Machine Learning |

Citations: | 31 - 0 self |

### BibTeX

@INPROCEEDINGS{Xuan07modelingchanging,

author = {Xiang Xuan and Kevin Murphy},

title = {Modeling changing dependency structure in multivariate time series},

booktitle = {In International Conference in Machine Learning},

year = {2007}

}

### OpenURL

### Abstract

We show how to apply the efficient Bayesian changepoint detection techniques of Fearnhead in the multivariate setting. We model the joint density of vector-valued observations using undirected Gaussian graphical models, whose structure we estimate. We show how we can exactly compute the MAP segmentation, as well as how to draw perfect samples from the posterior over segmentations, simultaneously accounting for uncertainty about the number and location of changepoints, as well as uncertainty about the covariance structure. We illustrate the technique by applying it to financial data and to bee tracking data. 1.

### Citations

1099 |
Graphical Models
- LAURITZEN
- 1996
(Show Context)
Citation Context ...sian graphical models provide a good compromise between these two extremes. We can either use directedsgraphs (i.e., Bayes nets; see e.g., (Geiger & Heckerman, 1994)) or undirected models (see e.g., (=-=Lauritzen, 1996-=-; Giudici & Green, 1999; Carvalho & West, 2006)). In this paper, we use undirected graphs, since there are very efficient procedures for estimating undirected graph structures (see Section 4), which i... |

628 |
Probabilistic Networks and Expert Systems
- Cowell, Dawid, et al.
- 1999
(Show Context)
Citation Context ...n of the marginal likelihood assumes the graph is decomposable, we convert each nondecomposable graph in our set M into its “closest” decomposable approximation by computing a min-fill triangulation (=-=Cowell et al., 1999-=-). To get the process started, we use the following heuristic. We slide a window of width w = 0.2T across the data, Changing Dependency Structure in Time Series (5) (6) shifting by σ = 0.1w at each st... |

489 | The infinite hidden Markov model
- Beal, Ghahramani, et al.
- 2002
(Show Context)
Citation Context ... the number of segments, whereas Talih and Hengartner assume this is known. It is interesting to compare the product partition model (PPM) to a hidden Markov model (HMM) and an “infinite HMM” (IHMM) (=-=Beal et al., 2002-=-). In an HMM, we have a fixed number of states S; we label each time step with a state, which specifies which parameters to use to generate an observation. In a PPM, we have an unbounded number of sta... |

383 | High dimensional graphs and variable selection with the lasso
- Meinshausen, Bühlmann
(Show Context)
Citation Context ...s. Note that this thresholding-based approach takes O(d 3 ) time, so scales much better than the block coordinate descent method. In the future, we would also like to try L1-based regression methods (=-=Meinshausen & Buhlmann, 2006-=-) for estimating K, which are also O(d 3 ). The above methods can learn arbitrary graph structures. Since our computation of the marginal likelihood assumes the graph is decomposable, we convert each ... |

144 | Prior distributions for variance parameters in hierarchical models. Bayesian Analysis
- Gelman
- 2006
(Show Context)
Citation Context ...in informal experiments, that the full covariance model is much more sensitive to the hyperparameter V0 than the GGM approach. (Setting priors for variance parameters is known to be a delicate issue (=-=Gelman, 2006-=-), especially when considering model uncertainty.) In all the experiments, we used V0 = ˆσ 2 I as a reasonable default prior. 5.3. Financial data Finally we applied the method to some financial data, ... |

131 | Strimmer K: A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 2005, 4:Article 32
- Schafer
(Show Context)
Citation Context ...is the number of iterations. Other, faster methods exist for estimating sparse undirected GGMs. One method that we tried is the following: compute a shrinkage estimate of ˆ Σ, using the technique pf (=-=Schaefer & Strimmer, 2005-=-). From this, compute the precision ˆ K = ˆ Σ −1 and the partial correlation coefficients ρij = − Kij � KiiKjj Then set edge Gij = 0 if |ρij| < θ, where θ is some threshold (we use 0.2); otherwise set... |

119 |
Hyper Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics
- Dawid, Lauritzen
- 1993
(Show Context)
Citation Context ...d (Roverato, 2002), but these are slow. So we shall assume that the graph structure is decomposable. Give a decomposable graph structure G, and assuming a hyper-inverse Wishart prior Σ ∼ HIWG(b0,V0) (=-=Dawid & Lauritzen, 1993-=-), where b0 = N0 +1−d > 0 is the degree of freedom and V0 is the location parameter, the marginal likelihood can be written as follows (Dawid & Lauritzen, 1993; Giudici & Green, 1999; Carvalho & West,... |

98 |
Bayesian Methods for Nonlinear Classification and Regression
- Denison, Holmes, et al.
- 2002
(Show Context)
Citation Context ...s such as finance. Our model can segment data based on all of these kinds of changes, as we will see. 2. Previous work A product partition model (PPM) (Barry & Hartigan, 1992; Barry & Hartigan, 1993; =-=Denison et al., 2002-=-) is a density model in which we assume we can partition the data into an unknown number K of partitions, π1,...,πK, such that the data is independent across segments: p(y1:T |π) = K� p(yπk ). k=1 (A ... |

87 | Bayesian methods for hidden Markov models: Recursive computing in the 21st century
- Scott
- 2002
(Show Context)
Citation Context ...tion, it is of course possible to compute the posterior over models and parameters in each segment, p(m,θ|ys:t).) The algorithm is very similar to the forwards-filtering backwards-sampling algorithm (=-=Scott, 2002-=-) for HMMs, except the “hidden variable” is not a discrete state index, but rather a time index encoding where the last change point occurred. Hence the algorithm takes O(T 2 ) time and O(T) space. In... |

63 | Decomposable graphical Gaussian model determination
- Giudici, Green
- 1999
(Show Context)
Citation Context ...dels provide a good compromise between these two extremes. We can either use directedsgraphs (i.e., Bayes nets; see e.g., (Geiger & Heckerman, 1994)) or undirected models (see e.g., (Lauritzen, 1996; =-=Giudici & Green, 1999-=-; Carvalho & West, 2006)). In this paper, we use undirected graphs, since there are very efficient procedures for estimating undirected graph structures (see Section 4), which is needed to compute the... |

54 | Bayesian curve fitting using MCMC with applications to signal segmentation
- Punskaya, Andrieu, et al.
- 2002
(Show Context)
Citation Context ...ead developed efficient dynamic programming algorithms for exactly computing the posterior over the number and location of changepoints in time series. This improved upon earlier approaches, such as (=-=Punskaya et al., 2002-=-), which relied on reversible jump MCMC. All of the examples that Fearnhead considered were univariate (one-dimensional) time series. In this paper, we show how to apply Fearnhead’s algorithms to mult... |

49 |
A Bayesian Analysis for Change Point Problems
- Barry, Hartigan
- 1993
(Show Context)
Citation Context ...tice, especially in areas such as finance. Our model can segment data based on all of these kinds of changes, as we will see. 2. Previous work A product partition model (PPM) (Barry & Hartigan, 1992; =-=Barry & Hartigan, 1993-=-; Denison et al., 2002) is a density model in which we assume we can partition the data into an unknown number K of partitions, π1,...,πK, such that the data is independent across segments: p(y1:T |π)... |

45 |
Product partition models for change point problems. The Annals of Statistics
- Barry, Hartigan
- 1992
(Show Context)
Citation Context ... but oftem arise in practice, especially in areas such as finance. Our model can segment data based on all of these kinds of changes, as we will see. 2. Previous work A product partition model (PPM) (=-=Barry & Hartigan, 1992-=-; Barry & Hartigan, 1993; Denison et al., 2002) is a density model in which we assume we can partition the data into an unknown number K of partitions, π1,...,πK, such that the data is independent acr... |

45 |
Exact and efficient Bayesian inference for multiple change-point problems, Stat
- Fearnhead
- 2006
(Show Context)
Citation Context ...omplexity (number of segments) and model fit. It also allows one to express uncertainty about the number, and location, of changepoints. In a series of papers (Fearnhead, 2004; Fearnhead & Liu, 2005; =-=Fearnhead, 2006-=-), Fearnhead developed efficient dynamic programming algorithms for exactly computing the posterior over the number and location of changepoints in time series. This improved upon earlier approaches, ... |

31 |
On-line inference for multiple changepoint problems
- Fearnhead, Liu
(Show Context)
Citation Context ...radeoff between model complexity (number of segments) and model fit. It also allows one to express uncertainty about the number, and location, of changepoints. In a series of papers (Fearnhead, 2004; =-=Fearnhead & Liu, 2005-=-; Fearnhead, 2006), Fearnhead developed efficient dynamic programming algorithms for exactly computing the posterior over the number and location of changepoints in time series. This improved upon ear... |

31 |
2002 Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models
- Roverato
- 2006
(Show Context)
Citation Context ...ion 4), which is needed to compute the model space M. Computing the marginal likelihood for non decomposable graphical models cannot be done in closed form. Various approximations have been proposed (=-=Roverato, 2002-=-), but these are slow. So we shall assume that the graph structure is decomposable. Give a decomposable graph structure G, and assuming a hyper-inverse Wishart prior Σ ∼ HIWG(b0,V0) (Dawid & Lauritzen... |

23 |
Exact Bayesian curve fitting and signal segmentation
- Fearnhead
- 2005
(Show Context)
Citation Context ...ally captures a tradeoff between model complexity (number of segments) and model fit. It also allows one to express uncertainty about the number, and location, of changepoints. In a series of papers (=-=Fearnhead, 2004-=-; Fearnhead & Liu, 2005; Fearnhead, 2006), Fearnhead developed efficient dynamic programming algorithms for exactly computing the posterior over the number and location of changepoints in time series.... |

10 |
Dynamic matrix-variate graphical models,” Bayesian Analysis 2
- Carvalho, West
- 2007
(Show Context)
Citation Context ...ointly. This allows us to segment based on a changing correlation structure, as well as changing mean, variance, etc, which is particularly useful in financial applications (Talih & Hengartner, 2005; =-=Carvalho & West, 2006-=-). Furthermore, the sparse structure within each segment is often interpretable. Figure 1 illustrates the basic problem we are trying to solve. In the case of 1D time series, a segmentation might be i... |

9 |
Parameterized duration modeling for switching linear dynamic systems
- Oh, Rehg, et al.
- 2006
(Show Context)
Citation Context ... not improve results significantly, suggesting that our initial heuristic oversegmentation is adequate for recovering M. 5.1. Bee data We first applied our method to a 3 dimensional data set used in (=-=Oh et al., 2006-=-). This consists of the x and y coordinates of a honeybee, and its head angle θ, as it moves around an enclosure, as observed by an overhead camera. 1 Some examples of the data, together with a ground... |

5 | Modal clustering in a univariate class of product partition models
- Dahl
- 2003
(Show Context)
Citation Context ... π1,...,πK, such that the data is independent across segments: p(y1:T |π) = K� p(yπk ). k=1 (A dirichlet process mixture model is a special case of a PPM, in which we assume a specific form for p(π) (=-=Dahl, 2003-=-).) If we assume that the partitions are nonoverlapping partitions of the interval 1 : T , then we can efficiently compute the posterior over K and π using dynamic programming, as Fearnhead showed; we... |

2 |
Some matrix-variate distribution theory: some notational considerations and a bayesian application
- Dawid
- 1981
(Show Context)
Citation Context ... (2) the length of the segment, q is the number of input features per time slice), β is a q × d matrix of regression parameters, and ɛ ∼ N(0,In,Σ), where N(M,V,Σ) is the matrix Gaussian distribution (=-=Dawid, 1981-=-) N(A;M,V,Σ) = |Σ|n/2 |2πV | d/2 ×exp(− 1 2 tr((A − M)T V −1 (A − M)Σ)) (4) where M is a n × d matrix representing the means, V is a n × n matrix representing covariance amongst the rows (time slices)... |

1 |
Learning Gaussian networks. UAI
- Geiger, Heckerman
- 1994
(Show Context)
Citation Context ...imation to Σ results in the independent features model. Gaussian graphical models provide a good compromise between these two extremes. We can either use directedsgraphs (i.e., Bayes nets; see e.g., (=-=Geiger & Heckerman, 1994-=-)) or undirected models (see e.g., (Lauritzen, 1996; Giudici & Green, 1999; Carvalho & West, 2006)). In this paper, we use undirected graphs, since there are very efficient procedures for estimating u... |

1 |
Bayesian linear regression (Technical Report
- Minka
- 2000
(Show Context)
Citation Context ...d Σ is a d × d matrix representing covariance amongst the columns (features). If we assume β|Σ ∼ N(0,D,Σ) where D = diag(δ 2 1,...,δ 2 q), and Σ ∼ IW(N0,V0), then the marginal likelihood is given by (=-=Minka, 2000-=-) nd − p(ys:t) = π 2 � � d 2 |M| |V0| |D| N0/2 |Vn| (n+N0)/2 Γd(N0/2) −1 Γd((N0 + n)/2) −1 M = (H T H + D −1 ) −1 , P = (I − HMH T ) Vn = V0 + Y T PY Using this framework, we can model multivariate au... |