Results 1  10
of
82
ℓ1 Trend Filtering
, 2007
"... The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on HodrickPrescott (HP) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on HodrickPrescott (HP) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., an ℓ1norm) for the sum of squares used in HP filtering to penalize variations in the estimated trend. The ℓ1 trend filtering method produces trend estimates that are piecewise linear, and therefore is well suited to analyzing time series with an underlying piecewise linear trend. The kinks, knots, or changes in slope, of the estimated trend can be interpreted as abrupt changes or events in the underlying dynamics of the time series. Using specialized interiorpoint methods, ℓ1 trend filtering can be carried out with not much more effort than HP filtering; in particular, the number of arithmetic operations required grows linearly with the number of data points. We describe the method and some of its basic properties, and give some illustrative examples. We show how the method is related to ℓ1 regularization based methods in sparse signal recovery and feature selection, and list some extensions of the basic method.
A Formal Study of Shot Boundary Detection
 Circuit and Systems For Video Technology, 2007, IEEE Transaction on
, 2007
"... Abstract—This paper conducts a formal study of the shot boundary detection problem. First, a general formal framework of shot boundary detection techniques is proposed. Three critical techniques, i.e., the representation of visual content, the construction of continuity signal and the classification ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
(Show Context)
Abstract—This paper conducts a formal study of the shot boundary detection problem. First, a general formal framework of shot boundary detection techniques is proposed. Three critical techniques, i.e., the representation of visual content, the construction of continuity signal and the classification of continuity values, are identified and formulated in the perspective of pattern recognition. Meanwhile, the major challenges to the framework are identified. Second, a comprehensive review of the existing approaches is conducted. The representative approaches are categorized and compared according to their roles in the formal framework. Based on the comparison of the existing approaches, optimal criteria for each module of the framework are discussed, which will provide practical guide for developing novel methods. Third, with all the above issues considered, we present a unified shot boundary detection system based on graph partition model. Extensive experiments are carried out on the platform of TRECVID. The experiments not only verify the optimal criteria discussed above, but also show that the proposed approach is among the best in the evaluation of TRECVID 2005. Finally, we conclude the paper and present some further discussions on what shot boundary detection can learn from other related fields. Index Terms—Formal framework, graph partition model, multiresolution analysis, shot boundary detection, support vector machine (SVM). I.
Periodicity detection in time series databases
 IEEE TRANS. KNOWL. DATA ENG
, 2005
"... Periodicity mining is used for predicting trends in time series data. Discovering the rate at which the time series is periodic has always been an obstacle for fully automated periodicity mining. Existing periodicity mining algorithms assume that the periodicity rate (or simply the period) is user ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Periodicity mining is used for predicting trends in time series data. Discovering the rate at which the time series is periodic has always been an obstacle for fully automated periodicity mining. Existing periodicity mining algorithms assume that the periodicity rate (or simply the period) is userspecified. This assumption is a considerable limitation, especially in time series data where the period is not known a priori. In this paper, we address the problem of detecting the periodicity rate of a time series database. Two types of periodicities are defined, and a scalable, computationally efficient algorithm is proposed for each type. The algorithms perform in Oðn log nÞ time for a time series of length n. Moreover, the proposed algorithms are extended in order to discover the periodic patterns of unknown periods at the same time without affecting the time complexity. Experimental results show that the proposed algorithms are highly accurate with respect to the discovered periodicity rates and periodic patterns. Realdata experiments demonstrate the practicality of the discovered periodic patterns.
Optimizing time series discretization for knowledge discovery
 Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05
, 2005
"... Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series a ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully. We propose a new method for meaningful unsupervised discretization of numeric time series called Persist. The algorithm is based on the KullbackLeibler divergence between the marginal and the selftransition probability distributions of the discretization symbols. Its performance is evaluated on both artificial and real life data in comparison to the most common discretization methods. Persist achieves significantly higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.
Approximate embeddingbased subsequence matching of time series
 In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
, 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTWbased subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to bruteforce search, with very small losses (< 1%) in retrieval accuracy.
Time series knowledge mining
, 2006
"... An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of loca ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge. The patterns have a hierarchical structure, each level corresponds to a single temporal concept. On the lowest level, intervals are used to represent duration. Overlapping parts of intervals represent coincidence on the next level. Several such blocks of intervals are connected with a partial order relation on the highest level. Each pattern element consists of a semiotic triple to connect syntactic and semantic information with pragmatics. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. Efficient algorithms for the discovery of the patterns are proposed. The search for coincidence as well as partial order can be formulated as variants of the well known frequent itemset problem. One of the best known algorithms for this problem is therefore adapted for our purposes. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using several data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field.
A Pattern Mining Approach for Classifying Multivariate Temporal Data
"... Abstract—We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal ab ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
(Show Context)
Abstract—We study the problem of learning classification models from complex multivariate temporal data encountered in electronic health record systems. The challenge is to define a good set of features that are able to represent well the temporal aspect of the data. Our method relies on temporal abstractions and temporal pattern mining to extract the classification features. Temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address this problem, we present the minimal predictive temporal patterns framework to generate a small set of predictive and nonspurious patterns. We apply our approach to the realworld clinical task of predicting patients who are at risk of developing heparin induced thrombocytopenia. The results demonstrate the benefit of our approach in learning accurate classifiers, which is a key step for developing intelligent clinical monitoring systems. I.
On the Feasibility of Effective Opportunistic Spectrum Access
 PROC. OF THE 10TH INTERNET MEASUREMENT CONFERENCE (IMC 2010)
, 2010
"... Dynamic spectrum access networks are designed to allow today’s bandwidth hungry “secondary devices” to share spectrum allocated to legacy devices, or “primary users.” The success of this wireless communication model relies on the availability of unused spectrum, and the ability of secondary devices ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Dynamic spectrum access networks are designed to allow today’s bandwidth hungry “secondary devices” to share spectrum allocated to legacy devices, or “primary users.” The success of this wireless communication model relies on the availability of unused spectrum, and the ability of secondary devices to utilize spectrum without disrupting transmissions of primary users. While recent measurement studies have shown that there is sufficient underutilized spectrum available, little is known about whether secondary devices can efficiently make use of available spectrum while minimizing disruptions to primary users.
In this paper, we present the first comprehensive study on the presence of “usable” spectrum in opportunistic spectrum access systems, and whether sufficient spectrum can be extracted by secondary devices to support traditional networking applications. We use for our study finegrain usage traces of a wide spectrum range (20MHz–6GHz) taken at 4 locations in Germany, the Netherlands, and Santa Barbara, California. Our study shows that on average, 54% of spectrum is never used and 26% is only partially used. Surprisingly, in this 26% of partially used spectrum, secondary devices can utilize very little spectrum using conservative access policies to minimize interference with primary users. Even assuming an optimal access scheme and extensive statistical knowledge of primary user access patterns, a user can only extract between 2030% of the total available spectrum. To provide better spectrum availability, we propose frequency bundling, where secondary devices build reliable channels by combining multiple unreliable frequencies into virtual frequency bundles. Analyzing our traces, we find that there is little correlation of spectrum availability across channels, and that bundling random channels together can provide sustained periods of reliable transmission with only short interruptions.
Efficient mining of understandable patterns from multivariate interval time series
 Data Mining and Knowledge Discovery
, 2007
"... Abstract. We present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge in time interval data. ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Abstract. We present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge in time interval data. The patterns have a hierarchical structure, with levels corresponding to the temporal concepts duration, coincidence, and partial order. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. The search for coincidence and partial order in interval data can be formulated as instances of the well known frequent itemset problem. Efficient algorithms for the discovery of the patterns are adapted accordingly. A novel form of search space pruning effectively reduces the size of the mining result to ease interpretation and speed up the algorithms. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using two real life data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field. Keywords: knowledge discovery, time series, interval patterns, Allen’s relations 1