• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A generic motif discovery algorithm for sequential data (2006)

by K L Jensen
Venue:Bioinformatics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 35
Next 10 →

Discovering Characteristic Actions from On-Body Sensor Data

by David Minnen, Thad Starner, Irfan Essa, Charles Isbell - IN PROC. OF IEEE INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTING , 2006
"... We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to disco ..."
Abstract - Cited by 36 (2 self) - Add to MetaCart
We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to discover motifs, sets of similar subsequences within the raw sensor stream, without the benefit of labels or manual segmentation. These motifs are statistically unlikely and thus typically correspond to important or characteristic actions within the activity. The problem
(Show Context)

Citation Context

...rotein sequences [1]. Many other specialized systems have been developed since then, though few are applicable to time series analysis since they were designed to work with categorical sequences (see =-=[11]-=- for a brief review). Recently, Jensen et al. generalized motif discovery over both categorical and continuous data and across arbitrary similarity metrics [11]. This represents a major improvement, b...

Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy

by David Minnen, Charles L. Isbell, Irfan Essa, Thad Starner - Mixture Learning. Twenty-Second Conf. on Artificial Intelligence (AAAI-07
"... The problem of locating motifs in real-valued, multivariate time series data involves the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several non-overlapping subsequences and constitutes a mo-tif because all of the included subsequences are similar. T ..."
Abstract - Cited by 24 (1 self) - Add to MetaCart
The problem of locating motifs in real-valued, multivariate time series data involves the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several non-overlapping subsequences and constitutes a mo-tif because all of the included subsequences are similar. The ability to automatically discover such motifs allows intelli-gent systems to form endogenously meaningful representa-tions of their environment through unsupervised sensor anal-ysis. In this paper, we formulate a unifying view of motif discovery as a problem of locating regions of high density in the space of all time series subsequences. Our approach is efficient (sub-quadratic in the length of the data), requires fewer user-specified parameters than previous methods, and naturally allows variable length motif occurrences and non-linear temporal warping. We evaluate the performance of our approach using four data sets from different domains includ-ing on-body inertial sensors and speech.
(Show Context)

Citation Context

...babilistic model for each motif (Bailey & Elkan 1994). GEMODA unifies several earlier methods but requires computing pair-wise distances between subsequences leading to a quadratic expected run time (=-=Jensen et al. 2006-=-). Finally, Blekas et al. (2003) adapted a method for spatial greedy mixture learning to sequential motif discovery, which inspired our use of continuous recognition and information gain. Data mining ...

Improving activity discovery with automatic neighborhood estimation

by David Minnen, Thad Starner, Irfan Essa, Charles Isbell - In International Joint Conference on Artificial Intelligence , 2007
"... A fundamental problem for artificial intelligence is identifying perceptual primitives from raw sensory signals that are useful for higher-level reasoning. We equate these primitives with initially unknown recurring patterns called motifs. Autonomously learning the motifs is difficult because their ..."
Abstract - Cited by 23 (3 self) - Add to MetaCart
A fundamental problem for artificial intelligence is identifying perceptual primitives from raw sensory signals that are useful for higher-level reasoning. We equate these primitives with initially unknown recurring patterns called motifs. Autonomously learning the motifs is difficult because their number, location, length, and shape are all unknown. Furthermore, nonlinear temporal warping may be required to ensure the similarity of motif occurrences. In this paper, we extend a leading motif discovery algorithm by allowing it to operate on multidimensional sensor data, incorporating automatic parameter estimation, and providing for motif-specific similarity adaptation. We evaluate our algorithm on several data sets and show how our approach leads to faster real world discovery and more accurate motifs compared to other leading methods. 1

Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery

by David Minnen, Charles Isbell, Irfan Essa, Thad Starner
"... Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the co ..."
Abstract - Cited by 20 (0 self) - Add to MetaCart
Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding relevant dimensions. While many approaches to motif discovery have been developed, most are restricted to categorical data, univariate time series, or multivariate data in which the temporal patterns span all of the dimensions. In this paper, we present an expected linear-time algorithm that addresses a generalization of multivariate pattern discovery in which each motif may span only a subset of the dimensions. To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor. 1.

Unsupervised simultaneous learning of gestures, actions and their associations for human-robot interaction

by Yasser Mohammad, Toyoaki Nishida, Shogo Okada - In IEEE IROS , 2009
"... Abstract — Human-Robot Interaction using free hand gestures is gaining more importance as more untrained humans are operating robots in home and office environments. The robot needs to solve three problems to be operated by free hand gestures: gesture (command) detection, action generation (related ..."
Abstract - Cited by 18 (12 self) - Add to MetaCart
Abstract — Human-Robot Interaction using free hand gestures is gaining more importance as more untrained humans are operating robots in home and office environments. The robot needs to solve three problems to be operated by free hand gestures: gesture (command) detection, action generation (related to the domain of the task) and association between gestures and actions. In this paper we propose a novel technique that allows the robot to solve these three problems together learning the action space, the command space, and their relations by just watching another robot operated by a human operator. The main technical contribution of this paper is the introduction of a novel algorithm that allows the robot to segment and discover patterns in its perceived signals without any prior knowledge of the number of different patterns, their occurrences or lengths. The second contribution is using a Ganger-Causality based test to limit the search space for the delay between actions and commands utilizing their relations and taking into account the autonomy level of the robot. The paper also presents a feasibility study in which the learning robot was able to predict actor’s behavior with 95.2% accuracy after monitoring a single interaction between a novice operator and a WOZ operated robot representing the actor. I.
(Show Context)

Citation Context

...ensively studied in data mining literature. Refer to [16] for a recent review. The research in motif discovery have led to many techniques including the PROJECTIONS algorithm [5], PERUSE [19], Gemoda =-=[10]-=- among many others. With the exception of Gemoda which is quadratic in time and space complexities, these algorithms aim to achieve sub-quadratic time complexity by first looking for candidate motif s...

Efficient motif search in ranked lists and applications to variable gap motifs

by Limor Leibovich, Zohar Yakhini - Nucleic Acids Research
"... Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approa ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of se-quences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large al-phabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refine-ment of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.
(Show Context)

Citation Context

...4). Computational models of dimers binding to two half-sites that feature certain spacing rules were suggested in a handful of recent studies. Several algorithms, including BioProspector (25), Gemoda =-=(26)-=-, SPACER (27), SPACE (28) and GLAM2 (29) deal with the problem of discovering gapped motifs. van Helden et al. (30) consider a model of a spaced pair of trinucleotides, separated by a spacer of a fixe...

Constrained Motif Discovery

by Yasser Mohammad, Toyoaki Nishida
"... Abstract — The goal of motif discovery algorithms is to efficiently find unknown recurring patterns in time series. Most available algorithms cannot utilize domain knowledge in any way which results in quadratic or at least sub-quadratic time and space complexity. For large time series datasets for ..."
Abstract - Cited by 6 (6 self) - Add to MetaCart
Abstract — The goal of motif discovery algorithms is to efficiently find unknown recurring patterns in time series. Most available algorithms cannot utilize domain knowledge in any way which results in quadratic or at least sub-quadratic time and space complexity. For large time series datasets for which domain knowledge can be available this is a severe limitation. In this paper we define the Constrained Motif Discovery problem which enables utilization of domain knowledge into the motif discovery process. We also show that most unconstrained motif discovery problems be converted into constrained motif discovery problem using a change point detection algorithm. We provide two algorithms for solving this problem and compare their performance to state-of-the-art motif discovery algorithms on a large set of synthetic time series. The proposed algorithms can provide linear time and constant space complexity. The proposed algorithms provided four to ten folds increase in speed compared to two state of the art motif discovery algorithms without loss of accuracy and provided better noise robustness in high noise levels. I.
(Show Context)

Citation Context

...ided better noise robustness in high noise levels. I. INTRODUCTION The research in unsupervised motif discovery have led to many techniques including the PROJECTIONS algorithm [1], PERUSE [2], Gemoda =-=[3]-=- among many others ([4]). With the exception of Gemoda which is quadratic in time and space complexities, these algorithms aim to achieve sub-quadratic time complexity by first looking for candidate m...

Motif-based Classification of Time Series with Bayesian Networks and SVMs

by Krisztian Buza, Lars Schmidt-thieme
"... Summary. Classification of time series is an important task with many challenging applications like brain wave (EEG) analysis, signature verification or speech recognition. In this paper we show how characteristic local patterns (motifs) can improve the classification accuracy. We introduce a new mo ..."
Abstract - Cited by 6 (6 self) - Add to MetaCart
Summary. Classification of time series is an important task with many challenging applications like brain wave (EEG) analysis, signature verification or speech recognition. In this paper we show how characteristic local patterns (motifs) can improve the classification accuracy. We introduce a new motif class, generalized semi-continuous motifs. To allow flexibility and noise robustness, these motifs may include gaps of various lengths, generic and more specific wildcards. We propose an efficient algorithm for mining generalized sequential motifs. In experiments on real medical data, we show how generalized semi-continuous motifs improve the accuracy of SVMs and Bayesian Networks for time series classificiation.
(Show Context)

Citation Context

...ter times series, and calculate the “compromise” time series for each cluster. Such a “compromise” time series is regarded as a representative pattern of the time series in the cluster. Jensen at al. =-=[13]-=- and Ferreira at al. [7] also use clustering, however in a more local fashion: they do not cluster the whole sequences, but subsequences of them. Predefining a (minimal) length L for motifs, scanning ...

Activity Discovery: Sparse Motifs from Multivariate Time Series

by David Minnen, Thad Starner, Irfan Essa, Charles Isbell
"... In a set of time series or other sequence data, a motif is a collection of relatively short subsequences that exhibit high self-similarity yet are distinguishable from other subsequences of the data. Typically, the occurrence of a motif corresponds to some meaningful aspect of the data such as a par ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
In a set of time series or other sequence data, a motif is a collection of relatively short subsequences that exhibit high self-similarity yet are distinguishable from other subsequences of the data. Typically, the occurrence of a motif corresponds to some meaningful aspect of the data such as a particular structure or binding site in biological sequences, a spoken word in speech data, or a specific robot behavior or response pattern. We address the problem of activity discovery, which deals with locating and modeling motifs in multivariate time series such as those captured by on-body sensors or from a video camera observing people engaged in some activity. We extend previous work in motif discovery to derive an algorithm that handles non-linear time warping, variable-length motifs, and which is efficient even when the motif occurrences are sparse relative to the full dataset. In bioinformatics, systems such MEME [1] were developed to discover motifs in DNA and protein sequences, while Jensen et al. [4] recently generalized motif discovery over both categorical and continuous data and across arbitrary similarity metrics. These algorithms were developed for sequences, however, and do not account for the dynamic nature of time series data. Within the data mining community, an efficient, probabilistic algorithm for motif discovery using locality-sensitive hashing was developed [2]. This approach only discovers fixed-length motifs in univariate data, however. Tanaka and Uehara generalized the approach to
(Show Context)

Citation Context

...t even when the motif occurrences are sparse relative to the full dataset. In bioinformatics, systems such MEME [1] were developed to discover motifs in DNA and protein sequences, while Jensen et al. =-=[4]-=- recently generalized motif discovery over both categorical and continuous data and across arbitrary similarity metrics. These algorithms were developed for sequences, however, and do not account for ...

VOGUE: A Variable Order Hidden Markov Model with Duration based on Frequent Sequence Mining

by Mohammed J. Zaki, Christopher D. Carothers, Boleslaw K. Szymanski
"... We present VOGUE, a novel, variable order hidden Markov model with state durations, that combines two separate techniques for modeling complex patterns in sequential data: pattern mining and data modeling. VOGUE relies on a variable gap sequence mining method to extract frequent patterns with differ ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
We present VOGUE, a novel, variable order hidden Markov model with state durations, that combines two separate techniques for modeling complex patterns in sequential data: pattern mining and data modeling. VOGUE relies on a variable gap sequence mining method to extract frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build a variable order hidden Markov model, that explicitly models the gaps. The gaps implicitly model the order of the HMM, and they explicitly model the duration of each state. We apply VOGUE to a variety of real sequence data taken from domains such as protein sequence classification, web usage logs, intrusion detection, and spelling correction. We show that VOGUE has superior classification accuracy compared to regular HMMs, higher-order HMMs, and even special purpose HMMs like HMMER, which is a state-of-the-art method for protein classification. The VOGUE implementation and the datasets used in this paper are available as open-source at: www.cs.rpi.edu/~zaki/software/VOGUE.
(Show Context)

Citation Context

...X, No. X, XX 2009, Pages 1–31.2 · posed for sequence pattern mining in both data mining [Srikant and Agrawal 1996; Mannila et al. 1995; Zaki 2001; Pei et al. 2001] and bioinformatics [Gusfield 1997; =-=Jensen et al. 2006-=-]. For sequence data modeling, Hidden Markov Models (HMMs) [Rabiner 1989] have been widely employed in a broad range of applications such as speech recognition, web usage analysis, and biological sequ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University