Results 1  10
of
22
Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research
 In Proc. of the 3rd IEEE International Conference on Data Mining
, 2003
"... Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algor ..."
Abstract

Cited by 78 (15 self)
 Add to MetaCart
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention. In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.
Predictive Ability with Cointegrated Variables
 Journal of Econometrics
, 2001
"... In this paper we outline conditions under which the Diebold and Mariano (DM: 1995) test for predictive ability can be extended to the case of two forecasting models, each of which may include cointegrating relations, when allowing for parameter estimation error. We show that in the cases where eithe ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
In this paper we outline conditions under which the Diebold and Mariano (DM: 1995) test for predictive ability can be extended to the case of two forecasting models, each of which may include cointegrating relations, when allowing for parameter estimation error. We show that in the cases where either the loss function is quadratic or the length of the prediction period, P, grows at a slower rate than the length of the regression period, R, the standard DM test can be used. On the other hand, in the case of a generic loss function, if P R ! as T ! 1, 0 < < 1, then the asymptotic normality result of West (1996) no longer holds. We also extend the "data snooping" technique of White (2000) for comparing the predictive ability of multiple forecasting models to the case of cointegrated variables. In a series of Monte Carlo experiments, we examine the impact of both short run and cointegrating vector parameter estimation error on DM, data snooping, and related tests. Our results sugge...
Estimation of copula models for time series of possibly different lengths
, 2001
"... The theory of conditional copulas provides a means of constructing flexible multivariate density models, allowing for timevarying conditional densities of each individual variable, and for timevarying conditional dependence between the variables. Further, the use of copulas in constructing these m ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
The theory of conditional copulas provides a means of constructing flexible multivariate density models, allowing for timevarying conditional densities of each individual variable, and for timevarying conditional dependence between the variables. Further, the use of copulas in constructing these models often allows for the partitioning of the parameter vector into elements relating only to a marginal distribution, and elements relating to the copula. This paper presents a twostage (or multistage) maximum likelihood estimator for the case that such a partition is possible. We extend the existing statistics literature on the estimation of copula models to consider data that exhibit temporal dependence and heterogeneity. The estimator is flexible enough that the case that unequal amounts of data are available on each variable is easily handled. We investigate the small sample properties of the estimator in a Monte Carlo study, and find that it performs well in comparisons with the standard (onestage) maximum likelihood estimator. Finally, we present an application of the estimator to a model of the joint distribution of daily Japanese yen U.S. dollar and euro U.S. dollar exchange rates. We find some evidence that a copula that captures...
The presidential puzzle: Political cycles and the stock market
 JOURNAL OF FINANCE
, 2003
"... The excess return in the stock market is higher under Democratic than Republican presidencies: 9 percent for the valueweighted and 16 percent for the equalweighted portfolio.The difference comes from higher real stock returns and lower real interest rates, is statistically significant, and is robu ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
The excess return in the stock market is higher under Democratic than Republican presidencies: 9 percent for the valueweighted and 16 percent for the equalweighted portfolio.The difference comes from higher real stock returns and lower real interest rates, is statistically significant, and is robust in subsamples.The difference in returns is not explained by businesscycle variables related to expected returns, and is not concentrated around election dates. There is no difference in the riskiness of the stock market across presidencies that could justify a risk premium. The difference in returns through the political cycle is therefore a puzzle.
A Consistent Test for Nonlinear Out of Sample Predictive Accuracy
, 2000
"... In this paper, we draw on both the consistent specification testing and the predictive ability testing literatures and propose a test for predictive accuracy which is consistent against generic nonlinear alternatives. Broadly speaking, given a particular reference model, assume that the objective is ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
In this paper, we draw on both the consistent specification testing and the predictive ability testing literatures and propose a test for predictive accuracy which is consistent against generic nonlinear alternatives. Broadly speaking, given a particular reference model, assume that the objective is to test whether there exists any alternative model, among an infinite number of alternatives, that has better predictive accuracy than the reference model, for a given loss function. A typical example is the case in which the reference model is a simple autoregressive model and the objective is to check whether a more accurate forecasting model can be constructed by including possibly unknown (non) linear functions of the past of the process or of the past of some other process(es). We propose a statistic which is similar in spirit to that of White (2000), although our approach diers from his as we allow for an innite number of competing models that may be nested. In addition, we allow for non ...
Reexamining the profitability of technical analysis with data snooping checks
 Journal of Financial Econometrics
, 2005
"... and the participants of the Taipei conference on “Analysis of HighFrequency Financial Data and Market Microstructure ” for their valuable comments and suggestions. We also thank P. R. Hansen for sharing his ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
and the participants of the Taipei conference on “Analysis of HighFrequency Financial Data and Market Microstructure ” for their valuable comments and suggestions. We also thank P. R. Hansen for sharing his
VALUE VERSUS GLAMOUR
"... The fragility of the CAPM has led to a resurgence of research that frequently uses trading strategies based on sorting procedures to uncover relations between firm characteristics (such as “value ” or “glamour”) and equity returns. We examine the propensity of these strategies to generate statistic ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The fragility of the CAPM has led to a resurgence of research that frequently uses trading strategies based on sorting procedures to uncover relations between firm characteristics (such as “value ” or “glamour”) and equity returns. We examine the propensity of these strategies to generate statistically and economically significant profits due to our familiarity with the data. Under plausible assumptions, datasnooping can account for up to 50 percent of the insample relations between firm characteristics and returns uncovered using single (oneway) sorts. The biases can be much larger if we simultaneously condition returns on two (or more) characteristics.
Clustering of streaming time series is meaningless
 In Proc. of the SIGMOD workshop in Data Mining and Knowledge Discovery
, 2003
"... Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algor ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention. In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.
Data snooping, dredging and fishing: The dark side of data mining a SIGKDD99 panel report
 SIGKDD Explorations
, 2000
"... This article briefly describes a panel discussion at SIGKDD99. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This article briefly describes a panel discussion at SIGKDD99.
High volatility, thick tails and extreme value theory in valueatrisk estimation
 Insurance: Mathematics and Economics
, 2003
"... In this paper, the performance of the extreme value theory in ValueatRisk calculations is compared to the performances of other wellknown modeling techniques, such as GARCH, variancecovariance method and historical simulation in a volatile stock market. The models studied can be classified into ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
In this paper, the performance of the extreme value theory in ValueatRisk calculations is compared to the performances of other wellknown modeling techniques, such as GARCH, variancecovariance method and historical simulation in a volatile stock market. The models studied can be classified into two groups. The first group consists of GARCH(1,1) and GARCH(1,1)t models which yield highly volatile quantile forecasts. The other group, consisting of historical simulation, variancecovariance approach, adaptive generalized pareto distribution (GPD) and nonadaptive GPD models leads to more stable quantile forecasts. The quantile forecasts of GARCH(1,1) models are excessively volatilite relative to the GPD quantile forecasts. This makes the GPD model to be a robust quantile forecasting tool which is practical to implement and regulate for VaR measurements. Key Words: ValueatRisk, financial risk management, extreme value theory.