## SIMULTANEOUSLY SEGMENTING MULTIPLE GENE EXPRESSION TIME COURSES BY ANALYZING CLUSTER DYNAMICS

### Cached

### Download Links

Citations: | 4 - 2 self |

### BibTeX

@MISC{Tadepalli_simultaneouslysegmenting,

author = {Satish Tadepalli and Naren Ramakrishnan and Layne T. Watson and Bhubaneshwar Mishra and Richard and F. Helm},

title = {SIMULTANEOUSLY SEGMENTING MULTIPLE GENE EXPRESSION TIME COURSES BY ANALYZING CLUSTER DYNAMICS},

year = {}

}

### OpenURL

### Abstract

We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and re-arrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological processes. By directly minimizing information-theoretic measures of segmentation quality derived from Kullback-Leibler (KL) divergences, our formulation reveals clusters of genes along with a segmentation such that clusters show concerted behavior within segments but exhibit significant re-grouping across segmentation boundaries. The results of the segmentation algorithm can be summarized as Gantt charts revealing temporal dependencies in the ordering of key biological processes. Applications to the yeast metabolic cycle and the yeast cell cycle are described. 1.

### Citations

791 |
Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization,” Molecular Biology of the Cell
- Spellman, Sherlock, et al.
- 1998
(Show Context)
Citation Context ...ell cycle are described. 1. Introduction Time course analysis has become an important tool for the study of developmental, disease progression, and cyclical biological processes, e.g., the cell cycle =-=[8]-=-, metabolic cycle [9], and even entire life cycles. The growing affordability of microarray screens has fostered the generation of many time series datasets. Recent research efforts have considered us... |

220 |
Botstein D, Futcher B. Comprehensive identification of cell cycle-regulated genes 148 the yeast Saccharomyces cerevisiae by microarray hybridization, Mol Biol Cell
- PT, Sherlock, et al.
- 1998
(Show Context)
Citation Context ... as a normalized probability distribution. In the case of table (a) each distribution is [1/3, 1/3, 1/3], which is a uniform distribution while in the case of tables (b) and (c), each distribution is =-=[0, 1, 0]-=- which has a maximum deviation from the uniform distribution. We use this observation to formulate our criterion as described below.344 S. Tadepalli et al. w tb ta Formally, given two windows w tb ta... |

87 | LANCELOT: A Fortran package for Large-scale Nonlinear Optimization (Release - Conn, Gould, et al. - 1992 |

50 | Segmenting Time Series: A Survey and Novel Approach. Data mining in Time Series Databases
- Keogh, Chu, et al.
- 1993
(Show Context)
Citation Context ...time series segmentation where the segmentation criterion is driven by measures over cluster dynamics. It is important to contrast our goals with prior work. Typical works on time series segmentation =-=[3]-=- are focused on segmenting a single time series whereas we are focused on simultaneously segmenting multiple time series. Typical works on segmentation view it as a problem of clustering time points w... |

45 |
Exact and efficient Bayesian inference for multiple change-point problems, Stat
- Fearnhead
- 2006
(Show Context)
Citation Context ...ularized348 S. Tadepalli et al. objective function: F = λ r∑ ( DKL pRi‖U( r 1 c )) + λ c i=1 − 1 N N∑ k=1 c∑ j=1 ( DKL pV (xk )‖U( 1 r )) − 1 N ( DKL pCj ‖U( 1 r )) N∑ k=1 ( DKL pV (yk )‖U( 1 c )) , =-=(12)-=- where λ is the weight, set to a value greater than 1, to give more emphasis to minimizing the row and column distributions. This also enforces equal cluster sizes. The role of λ is to enforce a “bala... |

31 | Modeling changing dependency structure in multivariate time series
- Xuan, Murphy
- 2007
(Show Context)
Citation Context ...′ ) j ∑N k ′=1 v(y k ′ ) j / 1 r )] ) ∇ m (x) i k ′=1 v(x k i ′ ]} / )] [ 1 r v (yk) j ∑N k ′=1 v(y k ′ ) j ]} v (xk) i ′ , (13) ∇ (x) v m i (xk) i = 2ρ(xk − m (x) i ) v D (xk) i (δi′,i + v (xk) i ). =-=(14)-=- Here δi ′,i is the Kronecker delta. The index variables i, i ′,andi′′ are over the clusters in the x vectors, j over the clusters in the y vectors, and k and k ′ over the data vectors. The gradients ... |

28 |
GJ, Bar-Joseph Z: Clustering short time series gene expression data
- Ernst, Nau
(Show Context)
Citation Context ...e non-normalized cluster assignment probabilities are given by ( ( )) ˆv xk i =exp ρ min i ′ γ (i,i′ )(xk) , (7) and the normalized probabilities are then given by i = ˆvxk ∑ i v (xk) i ′ ˆv xk i ′ . =-=(8)-=-Simultaneously Segmenting Multiple Gene Expression Time Courses 347 A well known approximation to mini ′ γ (i,i ′ )(xk) is the Kreisselmeier-Steinhauser (KS) envelope function 17,18 given by KSi(xk) ... |

26 |
Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes
- Tu, Kudlicki, et al.
- 2005
(Show Context)
Citation Context ...ed. 1. Introduction Time course analysis has become an important tool for the study of developmental, disease progression, and cyclical biological processes, e.g., the cell cycle [8], metabolic cycle =-=[9]-=-, and even entire life cycles. The growing affordability of microarray screens has fostered the generation of many time series datasets. Recent research efforts have considered using static measuremen... |

23 |
Schonhuth A, Steinhoff C: Using hidden markov models to analyze gene expression time course data
- Schliep
- 2003
(Show Context)
Citation Context ...inition of KL-divergence, the objective function in Eq. (4) can be expressed as F = − 1 r = − 1 r r∑ i=1 r∑ i=1 H(Ri)+log 2 (c) − 1 c H(β|α = i) − 1 c c∑ H(Cj)+log2 (r) j=1 c∑ H(α|β = j)+log2(r · c), =-=(5)-=- where H(X) is the entropy of the random variable X with probability mass function p(x) and is defined as H(X) =− ∑ x p(x)log2(p(x)). Entropy is a measure of the uncertainty of a random variable. Mini... |

21 |
Kudlicki A, Rowicka M, McKnight SL: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes
- BP
(Show Context)
Citation Context ...arameter, to avoid degenerate solutions. Hence the exact value of λ is not as crucial as the regime in which we conduct the optimization. In order to adjust λ, wevaryits value over a range (typically =-=[1,2]-=- in small step sizes). Based on experimentation, the cluster assignments of most gene vectors (more than 90%) do not change after a particular value of λ and we use this criterion to set λ. All the te... |

18 |
The information in Contingency Tables
- Gokhale, Kullback
- 1978
(Show Context)
Citation Context ... the Bayes factor. b The objective function defined in Eq. 1 has connections to the principle of minimum discrimination information (MDI) introduced by Kullback for the analysis of contingency tables =-=[4]-=-. The MDI principle states that if q is the assumed or true distribution, the estimated distribution p must be chosen such that DKL(p||q) is minimized. In our case, q is the uniform distribution desir... |

18 | Associative clustering for exploring dependencies between functional genomics data sets
- Kaski, Nikkila, et al.
(Show Context)
Citation Context ...tuitive since the converse problem of finding highly dependent clusters then reduces to simply maximizing Eq. 1. This converse problem is known as associative clustering and has been previous studied =-=[2]-=- using measures such as the Bayes factor. b The objective function defined in Eq. 1 has connections to the principle of minimum discrimination information (MDI) introduced by Kullback for the analysis... |

15 | Systematic control design by optimizing a vector performance index - Kreisselmeier, Steinhauser - 1979 |

15 | Scalable clustering algorithms with balancing constraints. Data Mining and Knowledge Discovery 13(3 - Banerjee, Ghosh - 2006 |

12 |
An Indirect Method for Numerical Optimization Using the Kreisselmeier–Steinhauser Function,” NASA CR-4220
- Wrenn
- 1989
(Show Context)
Citation Context ...ere D is the point-set diameter D D = max k,k ′ ||xk − xk ′||2 , 1 ≤ k, k ′ ≤ ν , 1 ≤ i, i ′ ≤ r A well known approximation to min γ (i,i′)(xk) is the Kreisselmeier-Steinhauser (KS) envelope function =-=[10]-=- given by i ′ KSi(xk) = −1 ρ ln [ r ∑ i ′ =1 ] exp(−ρ γ (i,i′)(xk)) where ρ ≫ 0. The KS function is a smooth function which is differentiable to any degree. Using this the cluster memberships are rede... |

10 | Disruption of Yeast Forkhead-associated Cell Cycle Transcription by Oxidative Stress - Shapira, Segal, et al. |

9 |
Molecular Biology of the cell 9
- Spellman
- 1998
(Show Context)
Citation Context ...g with probability nij n.j . We capture the deviation of these row-wise and column-wise distributions w.r.t. the uniform distribution as F = 1 r r� i=1 DKL(Ri||U( 1 1 )) + c c c� j=1 DKL(Cj||U( 1 )), =-=(1)-=- r where, DKL(p||q) = � x p(x)log p(x) 2 q(x) is the Kullback-Leibler (KL) divergence between two probability distributions p(x) and q(x) , and U(·) denotes the uniform distribution whose argument is ... |

9 |
Metabolic cycle, cell cycle, and the finishing kick to Start. Genome Biol
- Futcher
- 2006
(Show Context)
Citation Context ...g. 1 and Fig. 2) and noticed that the oxidative metabolism phase of YMC typically precedes the transition from G1 to S in the YCC. Such a connection has been investigated in the literature by Futcher =-=[1]-=- but through the use of a custom experiment observing metabolism during the course of the cell cycle. As the budding yeast grows in size, it is hypothesized that the accummulation of carbohydrates is ... |

9 |
Bar-Joseph Z: Combined static and dynamic analysis for determining the quality of time-Series expression profiles. Nature Biotechnology 2005
- Simon, Siegfried, et al.
(Show Context)
Citation Context ...ability of microarray screens has fostered the generation of many time series datasets. Recent research efforts have considered using static measurements to “fill in the gaps” in the time series data =-=[7]-=-, quantifying timing differences in gene expression [11], and reconstructing regulatory relationships [6]. One of the attractions of time series analysis is its promise to reveal temporal relationship... |

8 |
Bar-Joseph Z: Inferring pairwise regulatory relationships from multiple time series datasets
- Shi, Mitchell
(Show Context)
Citation Context ...fforts have considered using static measurements to “fill in the gaps” in the time series data [7], quantifying timing differences in gene expression [11], and reconstructing regulatory relationships =-=[6]-=-. One of the attractions of time series analysis is its promise to reveal temporal relationships underlying biological processes: which process occurs before what, what are the “checkpoints” that must... |

7 |
M`antaras, R.: A distance-based attribute selection measure for decision tree induction
- de
- 1991
(Show Context)
Citation Context ...calculated for each GO biological process term, and an appropriate cutoff is chosen using a false discovery rate (FDR) q− level of 0.01. The segmentation quality is calculated as a partition distance =-=[5]-=- between the “true” segmentation (from the literature of the YMC and YCC) to the segmentations computed by our algorithm. Since this measure requires partitions with no overlap between blocks, we view... |

7 |
Simon: Continuous representations of time series gene expresion data
- Bar-Joseph, Gerber, et al.
- 2003
(Show Context)
Citation Context ...ibution is U(1 r ). We capture the deviation of these row-wise and column-wise distributions w.r.t. the uniform distribution as: F = 1 r r∑ i=1 ( DKL pRi‖U( 1 c )) + 1 c c∑ j=1 ( DKL pCj ‖U( 1 r )) , =-=(4)-=- where DKL(p‖q) = ∑ x p(x)log p(x) 2 q(x) is the Kullback-Leibler (KL) divergence between two probability distributions with probability mass functions p(x) and q(x), and U(·) denotes the uniform dist... |

7 | Odstrcil EA, Tu BP, McKnight SL: Restriction of DNA Replication to the Reductive - Chen |

6 | Fuzzy clustering based segmentation of timeseries
- Abonyi, Feil, et al.
- 2003
(Show Context)
Citation Context ...′ ) i ′ v(y k ′ ) j ∑N k ′=1(v(x k ′ ) i ′ )2 (∑N k ′=1 v(x k ′ ) i ′ v(y k ′ ) j ∑N k ′=1 v(y k ′ ) j / 1 r )] ) ∇ m (x) i k ′=1 v(x k i ′ ]} / )] [ 1 r v (yk) j ∑N k ′=1 v(y k ′ ) j ]} v (xk) i ′ , =-=(13)-=- ∇ (x) v m i (xk) i = 2ρ(xk − m (x) i ) v D (xk) i (δi′,i + v (xk) i ). (14) Here δi ′,i is the Kronecker delta. The index variables i, i ′,andi′′ are over the clusters in the x vectors, j over the cl... |

6 | Regulation of yeast oscillatory dynamics - Murray, Beckmann, et al. - 2007 |

3 | B (2005) Reconstructing formal temporal logic models of cellular events using the GO process ontology [abstract - Ramakrishnan, Antoniotti, et al. |

3 |
A hidden markov model-based approach for identifying timing differences in gene expression under different experimental factors. Bioinformatics, 23, 842–849. APPENDIX A For discriminative training of HMM using MMIE, the smoothing constants are set to twic
- Yoneya, Mamitsuka
- 2007
(Show Context)
Citation Context ...on of many time series datasets. Recent research efforts have considered using static measurements to “fill in the gaps” in the time series data [7], quantifying timing differences in gene expression =-=[11]-=-, and reconstructing regulatory relationships [6]. One of the attractions of time series analysis is its promise to reveal temporal relationships underlying biological processes: which process occurs ... |

3 |
B (2005) Active learning for sampling in timeseries experiments with application to gene expression analysis
- Singh, Palmer, et al.
(Show Context)
Citation Context ..., i ′ ≤ r, (6) where D =maxk,k ′ ‖xk − xk ′‖2 , 1 ≤ k, k ′ ≤ N is the pointset diameter. The non-normalized cluster assignment probabilities are given by ( ( )) ˆv xk i =exp ρ min i ′ γ (i,i′ )(xk) , =-=(7)-=- and the normalized probabilities are then given by i = ˆvxk ∑ i v (xk) i ′ ˆv xk i ′ . (8)Simultaneously Segmenting Multiple Gene Expression Time Courses 347 A well known approximation to mini ′ γ (... |

3 | Discovering temporal knowledge in multivariate time series
- Mörchen, Ultsch
- 2005
(Show Context)
Citation Context ... tb ta ∈S1 ∑ z td tc ∈S2 ∑ z t d tc ∈S2 ∑ w t b ta ∈S1 |w tb ∩ ztd ta tc | log |w 2 tb ∩ ztd ta tc | |w tb ta | and ztd respectively, the tc |w tb ta ∩ ztd tc | log |w 2 tb ta ∩ ztd tc | |z td tc | . =-=(15)-=- The segmentation sensitivity to variations in the number of clusters is calculated as the average of the ratios of KL-divergences between the segments to the maximum possible KL divergence between th... |

2 | Genome Biology 7 - Futcher - 2006 |

2 |
Transcriptional profile of aging in C. elegans Curr Biol 12
- Lund, Tedesco, et al.
- 2002
(Show Context)
Citation Context ...al probabilities of the cluster variables α and β as follows: P (Ri = j) =P (β = j|α = i) = P (Cj = i) =P (α = i|β = j) = P (α = i, β = j) P (α = i) P (α = i, β = j) P (β = j) = nij , (2) ni. = nij . =-=(3)-=- n.j Each row variable Ri takes c values from the columns corresponding to the ith row as given by the probability mass function pRi, and similarly each column variable Cj takes r values from the rows... |

2 | Mántaras RL: A distance-based attribute selection measure for decision tree induction - De - 1991 |

1 |
Gokhale DV, The Information
- Kullback
- 1978
(Show Context)
Citation Context ...at t1 as well as for those ending at tl, depending on lmin and lmax). We find the shortest path using dynamic programming (Dijkstra’s algorithm) where the path length is defined as Davg, given by Eq. =-=(16)-=-, described later. 6. Experiments 6.1. Datasets We analyzed the following datasets using our segmentation algorithm. YMC: As stated earlier, the YMC dataset 2 consists of 36 time points collected over... |

1 | Improved multi-level optimization approach for the design of complex engineering systems - JFM, MF - 1988 |

1 | Toint PL, LANCELOT: A Fortran Package for Large-scale Nonlinear Optimization (Release - AR, NIM - 1992 |