## Constructing comprehensive summaries of large event

### Cached

### Download Links

Citations: | 9 - 1 self |

### BibTeX

@MISC{Kiernan_constructingcomprehensive,

author = {Jerry Kiernan and Evimaria Terzi},

title = {Constructing comprehensive summaries of large event},

year = {}

}

### OpenURL

### Abstract

sequences

### Citations

1174 | Mining Sequential Patterns
- Agrawal, Srikant
- 1995
(Show Context)
Citation Context ...C (b) Illustration of the segmental grouping A,B C 12 21 (c) Output summary Figure 1: Visual representation of an event sequence that contains events of three event types {A, B, C} and spans timeline =-=[1, 30]-=-. Figure 1(a) shows the input sequence; Figure 1(b) shows the segmental grouping and Figure 1(c) shows the high-level view of our summary. Same tone of gray correspond to same group. Example 1. Figure... |

1160 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...rst review the Minimum Description Length (MDL) principle. Then, we show how to apply this principle to formalize the Summarization problem. 3.1 Minimum Description Length Principle The MDL principle =-=[15, 16]-=- allows us to transform the requirement of balance between over-generalizing and overfitting into a computational requirement. In brief the MDL principle states the following: assume two parties P and... |

834 | An introduction to Hidden Markov Models
- Rabiner, Juang
- 1986
(Show Context)
Citation Context ... not on time series, while at the same time the local models we consider per segment are quite distinct from the models considered before. Same high-level connection exists between our model and HMMs =-=[14]-=-. However, the assumptions behind HMMs are different from the assumptions we make in this model. 7. CONCLUSIONS We proposed a framework and an algorithmic solution to the problem of summarizing delugi... |

550 | Mining Sequential Patterns: Generalizations and Performance Improvements
- Srikant, Agrawal
- 1996
(Show Context)
Citation Context ...o ours is the work on mining episodes and sequential patterns ([Agrawal and Srikant 1995; Bettini et al. 1998; Chudova and Smyth 2002; Mannila and Toivonen 1996; Mannila et al. 1997; Pei et al. 2007; =-=Srikant and Agrawal 1996-=-; Yang et al. 2002]). That work mostly focused on developing algorithms that identify configurations of discrete events clustered in time. Although these algorithms identify local event patterns they ... |

300 | Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery
- Mannila, Toivonen, et al.
- 1997
(Show Context)
Citation Context ...ime-series analysis. Closely related to ours is the work on mining episodes and sequential patterns ([Agrawal and Srikant 1995; Bettini et al. 1998; Chudova and Smyth 2002; Mannila and Toivonen 1996; =-=Mannila et al. 1997-=-; Pei et al. 2007; Srikant and Agrawal 1996; Yang et al. 2002]). That work mostly focused on developing algorithms that identify configurations of discrete events clustered in time. Although these alg... |

259 | Bursty and hierarchical structure in streams - Kleinberg |

167 | Statstream: Statistical Monitoring of Thousands of Data Streams in Real Time
- Zhu, Shasha
- 2002
(Show Context)
Citation Context ...t types {A, B, C} and it spans timeline [1, 30] that consists of 30 discrete timestamps. Figure 1(b) shows the actual segmental grouping that our method finds. Three segments are identified: [1, 11], =-=[12, 20]-=- and [21, 30]. Within each segment the events are grouped into two groups; event types with similar frequency of appearance within a segment are grouped together. In the first segment, the two groups ... |

147 | An online algorithm for segmenting time series
- Keogh, Chu, et al.
- 2001
(Show Context)
Citation Context ...an NP-hard task, while in our case this task is polynomial. At a high level there is an obvious connection between our model and the standard segmentation model used for time-series segmentation (see =-=[4, 5, 6, 18]-=- and indicative, though not complete, set of references). Similarly, there is an equally interesting line of work that deals with the discovery of local patterns in time-series data, e.g., [12, 17, 20... |

129 | Data-streams and histograms
- Guha, Koudas, et al.
- 2001
(Show Context)
Citation Context ...an NP-hard task, while in our case this task is polynomial. At a high level there is an obvious connection between our model and the standard segmentation model used for time-series segmentation (see =-=[4, 5, 6, 18]-=- and indicative, though not complete, set of references). Similarly, there is an equally interesting line of work that deals with the discovery of local patterns in time-series data, e.g., [12, 17, 20... |

127 | Efficient mining of partial periodic patterns in time series database
- Han, Dong, et al.
- 1999
(Show Context)
Citation Context ...ies to optimize a different optimization function. Periodicity detection in event sequences has been the focus of many sequential data-analysis techniques (e.g., [Elfeky et al. 2004; Han et al. 1998; =-=Han et al. 1999-=-; Ma and Hellerstein 2001]). Although periodicity detection in event sequences is an interesting topic by itself, it is not the focus of our paper. We focus on finding local associations across differ... |

123 | Discovering generalized episodes using minimal occurrences
- Mannila, Toivonen
- 1996
(Show Context)
Citation Context ...ummarization model for event sequences, our work clearly overlaps with work on sequence mining and timeseries analysis. Closely related to ours is the work on mining episodes and sequential patterns (=-=[1, 3, 10, 13, 19]-=-). That work mostly focuses on developing algorithms that identify configurations of discrete events clustered in time. Although those algorithms identify local event patterns, known as frequent episo... |

75 | Temporal summaries of new topics
- Allan, Gupta, et al.
- 2001
(Show Context)
Citation Context ...ior. Identifying time intervals at which an event or a combination of events makes bursty appearances has been the focus of many papers associated with mining and analysis of document streams, e.g., [=-=Allan et al. 2001-=-; Brants and Chen 2003; Kleinberg 2003; Swan and Allan 2000; Yang et al. 2000]. In that setting events correspond to specific words appearing on a stream of documents. Our methodology can also be appl... |

63 | R.: MDL-based decision tree pruning
- Mehta, Rissanen, et al.
- 1995
(Show Context)
Citation Context ...hree event types {A, B, C} and it spans timeline [1, 30] that consists of 30 discrete timestamps. Figure 1(b) shows the actual segmental grouping that our method finds. Three segments are identified: =-=[1, 11]-=-, [12, 20] and [21, 30]. Within each segment the events are grouped into two groups; event types with similar frequency of appearance within a segment are grouped together. In the first segment, the t... |

63 |
Automatic generation of overview timelines
- Swan, Allan
- 2000
(Show Context)
Citation Context ...mbination of events makes bursty appearances has been the focus of many papers associated with mining and analysis of document streams, e.g., [Allan et al. 2001; Brants and Chen 2003; Kleinberg 2003; =-=Swan and Allan 2000-=-; Yang et al. 2000]. In that setting events correspond to specific words appearing on a stream of documents. Our methodology can also be applied to the analysis of document streams. Since the class of... |

62 |
On the approximation of curves by line segments using dynamic programming
- Bellman
- 1961
(Show Context)
Citation Context ...on. For every 1 ≤ i ≤ n, Tl ∗ (S[1, i]) = (5) = min 1≤j≤i {Tl∗ (S[1, j] + Ll ∗ (S [j + 1, i]))} .The proof of optimality is omitted due to space constraints. However, a similar proof can be found in =-=[2]-=-. We call the dynamic-programming algorithm that implements Recursion (5) the Segment-DP algorithm. If TL is the time required to evaluate Ll ∗ (S [I]), then the running time of the Segment-DP algorit... |

56 | Mining Partially Periodic Event Patterns with Unknown Periods - Ma, Hellerstein - 2001 |

52 | Mining Segment-Wise Periodic Patterns in Time Related Databases
- Han, Gong, et al.
- 1998
(Show Context)
Citation Context ...rent model and tries to optimize a different optimization function. Periodicity detection in event sequences has been the focus of many sequential data-analysis techniques (e.g., [Elfeky et al. 2004; =-=Han et al. 1998-=-; Han et al. 1999; Ma and Hellerstein 2001]). Although periodicity detection in event sequences is an interesting topic by itself, it is not the focus of our paper. We focus on finding local associati... |

52 | Mining long sequential patterns in a noisy environment
- Yang, Watson, et al.
- 2002
(Show Context)
Citation Context ...ummarization model for event sequences, our work clearly overlaps with work on sequence mining and timeseries analysis. Closely related to ours is the work on mining episodes and sequential patterns (=-=[1, 3, 10, 13, 19]-=-). That work mostly focuses on developing algorithms that identify configurations of discrete events clustered in time. Although those algorithms identify local event patterns, known as frequent episo... |

49 | A system for New Event Detection - Brants, Chen, et al. |

43 | A linear time algorithm for finding all maximal scoring subsequences
- WL, Tompa
- 1999
(Show Context)
Citation Context ...al problem of finding homogeneous DNA sequences. This problem has been extensively studied in bioinformatics leading to a variety of segmentation algorithms [Gionis and Mannila 2003; Li 2001a; 2001b; =-=Ruzzo and Tompa 1999-=-]. There is only high-level connection between these pieces of work and ours: although we both deal with segmentation problems and in many cases we use dynamic programming as our main algorithmic tool... |

32 | Improving text categorization methods for event tracking
- Yang, Ault, et al.
- 2000
(Show Context)
Citation Context ...akes bursty appearances has been the focus of many papers associated with mining and analysis of document streams, e.g., [Allan et al. 2001; Brants and Chen 2003; Kleinberg 2003; Swan and Allan 2000; =-=Yang et al. 2000-=-]. In that setting events correspond to specific words appearing on a stream of documents. Our methodology can also be applied to the analysis of document streams. Since the class of models we are con... |

31 | Constraint-based Sequential Pattern Mining: the Pattern-growth Methods
- Pei, Han, et al.
- 2007
(Show Context)
Citation Context ...ummarization model for event sequences, our work clearly overlaps with work on sequence mining and timeseries analysis. Closely related to ours is the work on mining episodes and sequential patterns (=-=[1, 3, 10, 13, 19]-=-). That work mostly focuses on developing algorithms that identify configurations of discrete events clustered in time. Although those algorithms identify local event patterns, known as frequent episo... |

30 |
BRAID: Stream mining through group lag correlations
- Sakurai, Papadimitriou, et al.
- 2005
(Show Context)
Citation Context ..., 5, 6, 18] and indicative, though not complete, set of references). Similarly, there is an equally interesting line of work that deals with the discovery of local patterns in time-series data, e.g., =-=[12, 17, 20]-=-. However, the connection to our work remains at a high level since we focus on event sequences and not on time series, while at the same time the local models we consider per segment are quite distin... |

27 | An mdl method for finding haplotype blocks and for estimating the strength of haplotype block boundaries
- Koivisto, Perola, et al.
- 2003
(Show Context)
Citation Context ... Transactions on Computational Logic, Vol. 2, No. 3, 09 2001.Constructing comprehensive summaries of large event sequences · 115 intensities. Also related is the segmentation framework developed by [=-=Koivisto et al. 2003-=-] in order to identify block structures in genetic sequences. A minimum description length approach is also used there for identifying the number and positions of segment boundaries. However, the mode... |

25 | Mining temporal relationships with multiple granularities in time sequences
- Bettini, Wang, et al.
- 1998
(Show Context)
Citation Context ... sequences, our work clearly overlaps with work on sequence mining and time-series analysis. Closely related to ours is the work on mining episodes and sequential patterns ([Agrawal and Srikant 1995; =-=Bettini et al. 1998-=-; Chudova and Smyth 2002; Mannila and Toivonen 1996; Mannila et al. 1997; Pei et al. 2007; Srikant and Agrawal 1996; Yang et al. 2002]). That work mostly focused on developing algorithms that identify... |

24 | Finding Recurrent Sources in Sequences - Gionis, Mannila - 2003 |

23 |
Stochastic Complexity in Statistical Inquiry Theory
- Rissanen
- 1989
(Show Context)
Citation Context ...rst review the Minimum Description Length (MDL) principle. Then, we show how to apply this principle to formalize the Summarization problem. 3.1 Minimum Description Length Principle The MDL principle =-=[15, 16]-=- allows us to transform the requirement of balance between over-generalizing and overfitting into a computational requirement. In brief the MDL principle states the following: assume two parties P and... |

19 | Efficient algorithms for sequence segmentation
- Terzi, Tsaparas
- 2006
(Show Context)
Citation Context ...an NP-hard task, while in our case this task is polynomial. At a high level there is an obvious connection between our model and the standard segmentation model used for time-series segmentation (see =-=[4, 5, 6, 18]-=- and indicative, though not complete, set of references). Similarly, there is an equally interesting line of work that deals with the discovery of local patterns in time-series data, e.g., [12, 17, 20... |

18 | New stopping criteria for segmenting DNA sequences - Li - 2001 |

17 | Pattern discovery in sequences under a markov assumption
- Chudova, Smyth
- 2002
(Show Context)
Citation Context |

17 |
MDL learning of unions of simple pattern languages from positive examples
- Kilpeläinen, Mannila, et al.
- 1995
(Show Context)
Citation Context ...f modeling assumptions. It has been previously successfully applied in a variety of settings that range from decision-tree classifiers [11], genetic-sequence modeling [8], patterns in sets of strings =-=[7]-=- and many more. We devote the rest of the section to describe our instantiation of the MDL principle. 3.2 The Encoding Scheme Recall that we model event sequences using a segmentation model that parti... |

15 | Using convolution to mine obscure periodic patterns in one pass
- Elfeky, Aref, et al.
(Show Context)
Citation Context ...aper studies a different model and tries to optimize a different optimization function. Periodicity detection in event sequences has been the focus of many sequential data-analysis techniques (e.g., [=-=Elfeky et al. 2004-=-; Han et al. 1998; Han et al. 1999; Ma and Hellerstein 2001]). Although periodicity detection in event sequences is an interesting topic by itself, it is not the focus of our paper. We focus on findin... |

11 | Optimal multi-scale patterns in time series streams
- Papadimitriou
(Show Context)
Citation Context ...t types {A, B, C} and it spans timeline [1, 30] that consists of 30 discrete timestamps. Figure 1(b) shows the actual segmental grouping that our method finds. Three segments are identified: [1, 11], =-=[12, 20]-=- and [21, 30]. Within each segment the events are grouped into two groups; event types with similar frequency of appearance within a segment are grouped together. In the first segment, the two groups ... |

10 | Dna segmentation as a model selection process - Li |

8 | Exploiting duality in summarization with deterministic guarantees
- Karras, Sacharidis, et al.
- 2007
(Show Context)
Citation Context |

8 |
Finding simple intensity descriptions from event sequence data
- Mannila, Salmenkivi
- 2001
(Show Context)
Citation Context ...do not provide a global description of the event sequence neither do they care about the conciseness of the produced patterns. Summarization of event sequences via a segmentation model is proposed in =-=[9]-=-. However, the technique presented there can only model sequences of single event types; within each local interval, the appearances of events are modelled by a constant intensity model. In fact, one ... |

6 |
et al. An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries
- Koivisto, Perola, et al.
- 2003
(Show Context)
Citation Context ...ns that are determined by a set of modeling assumptions. It has been previously successfully applied in a variety of settings that range from decision-tree classifiers [11], genetic-sequence modeling =-=[8]-=-, patterns in sets of strings [7] and many more. We devote the rest of the section to describe our instantiation of the MDL principle. 3.2 The Encoding Scheme Recall that we model event sequences usin... |

2 | Eventsummarizer: A tool for summarizing large event sequences - Kiernan, Terzi |

1 | Month 20YY. comprehensive summaries of large event sequences · 139 - No - 2000 |