Results 1 - 10
of
17
Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research
- In Proc. of the 3rd IEEE International Conference on Data Mining
, 2003
"... Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algor ..."
Abstract
-
Cited by 58 (7 self)
- Add to MetaCart
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention. In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.
Predictive Ability with Cointegrated Variables
- Journal of Econometrics
, 2001
"... In this paper we outline conditions under which the Diebold and Mariano (DM: 1995) test for predictive ability can be extended to the case of two forecasting models, each of which may include cointegrating relations, when allowing for parameter estimation error. We show that in the cases where eithe ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
In this paper we outline conditions under which the Diebold and Mariano (DM: 1995) test for predictive ability can be extended to the case of two forecasting models, each of which may include cointegrating relations, when allowing for parameter estimation error. We show that in the cases where either the loss function is quadratic or the length of the prediction period, P, grows at a slower rate than the length of the regression period, R, the standard DM test can be used. On the other hand, in the case of a generic loss function, if P R ! as T ! 1, 0 < < 1, then the asymptotic normality result of West (1996) no longer holds. We also extend the "data snooping" technique of White (2000) for comparing the predictive ability of multiple forecasting models to the case of cointegrated variables. In a series of Monte Carlo experiments, we examine the impact of both short run and cointegrating vector parameter estimation error on DM, data snooping, and related tests. Our results sugge...
Estimation of copula models for time series of possibly different lengths
, 2001
"... The theory of conditional copulas provides a means of constructing flexible multivariate density models, allowing for time-varying conditional densities of each individual variable, and for time-varying conditional dependence between the variables. Further, the use of copulas in constructing these m ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The theory of conditional copulas provides a means of constructing flexible multivariate density models, allowing for time-varying conditional densities of each individual variable, and for time-varying conditional dependence between the variables. Further, the use of copulas in constructing these models often allows for the partitioning of the parameter vector into elements relating only to a marginal distribution, and elements relating to the copula. This paper presents a two-stage (or multi-stage) maximum likelihood estimator for the case that such a partition is possible. We extend the existing statistics literature on the estimation of copula models to consider data that exhibit temporal dependence and heterogeneity. The estimator is flexible enough that the case that unequal amounts of data are available on each variable is easily handled. We investigate the small sample properties of the estimator in a Monte Carlo study, and find that it performs well in comparisons with the standard (one-stage) maximum likelihood estimator. Finally, we present an application of the estimator to a model of the joint distribution of daily Japanese yen- U.S. dollar and euro- U.S. dollar exchange rates. We find some evidence that a copula that captures...
A Consistent Test for Nonlinear Out of Sample Predictive Accuracy
, 2000
"... In this paper, we draw on both the consistent specification testing and the predictive ability testing literatures and propose a test for predictive accuracy which is consistent against generic nonlinear alternatives. Broadly speaking, given a particular reference model, assume that the objective is ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In this paper, we draw on both the consistent specification testing and the predictive ability testing literatures and propose a test for predictive accuracy which is consistent against generic nonlinear alternatives. Broadly speaking, given a particular reference model, assume that the objective is to test whether there exists any alternative model, among an infinite number of alternatives, that has better predictive accuracy than the reference model, for a given loss function. A typical example is the case in which the reference model is a simple autoregressive model and the objective is to check whether a more accurate forecasting model can be constructed by including possibly unknown (non) linear functions of the past of the process or of the past of some other process(es). We propose a statistic which is similar in spirit to that of White (2000), although our approach diers from his as we allow for an innite number of competing models that may be nested. In addition, we allow for non ...
Reexamining the profitability of technical analysis with data snooping checks
- Journal of Financial Econometrics
, 2005
"... and the participants of the Taipei conference on “Analysis of High-Frequency Financial Data and Market Microstructure ” for their valuable comments and suggestions. We also thank P. R. Hansen for sharing his ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
and the participants of the Taipei conference on “Analysis of High-Frequency Financial Data and Market Microstructure ” for their valuable comments and suggestions. We also thank P. R. Hansen for sharing his
Data snooping, dredging and fishing: The dark side of data mining a SIGKDD99 panel report
- SIGKDD Explorations
, 2000
"... This article briefly describes a panel discussion at SIGKDD99. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This article briefly describes a panel discussion at SIGKDD99.
VALUE VERSUS GLAMOUR
"... The fragility of the CAPM has led to a resurgence of research that frequently uses trading strategies based on sorting procedures to uncover relations between firm characteristics (such as “value ” or “glamour”) and equity returns. We examine the propensity of these strategies to generate statistic ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The fragility of the CAPM has led to a resurgence of research that frequently uses trading strategies based on sorting procedures to uncover relations between firm characteristics (such as “value ” or “glamour”) and equity returns. We examine the propensity of these strategies to generate statistically and economically significant profits due to our familiarity with the data. Under plausible assumptions, data-snooping can account for up to 50 percent of the insample relations between firm characteristics and returns uncovered using single (one-way) sorts. The biases can be much larger if we simultaneously condition returns on two (or more) characteristics.
2004), Some recent developments in predictive accuracy testing with nested models and (generic) non-linear alternatives
- International Journal of Forecasting
"... Forecasters and applied econometricians are often interested in comparing the predictive accuracy of nested competing models. A leading example of a context in which competing models are nested is when predictive ability is equated with “out-of-sample Granger causality”. In particular, it is often o ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Forecasters and applied econometricians are often interested in comparing the predictive accuracy of nested competing models. A leading example of a context in which competing models are nested is when predictive ability is equated with “out-of-sample Granger causality”. In particular, it is often of interest to assess whether historical data from one variable are useful when constructing a forecasting model for another variable, and hence our use of terminology such as “out-of-sample Granger causality ” (see e.g. Ashley, Granger and Schmalensee (1980)). In this paper we examine and discuss three key issues one is faced with when constructing predictive accuracy tests, namely: the contribution of parameter estimation error, the choice of linear versus nonlinear models, and the issue of (dynamic) misspecification, with primary focus on the latter of these issues. One of our main conclusions is that there are a number of easy to apply statistics constructed using out of sample conditional moment conditions which are robust to the presence of dynamic misspecification under both hypothesis. We provide some new Monte Carlo findings and empirical evidence based on the use of such tests. In particular, we analyze the finite sample properties of the consistent out of sample test of Corradi and Swanson (2002) using data generating processes calibrated with
The presidential puzzle: Political cycles and the stock market
- JOURNAL OF FINANCE
, 2003
"... The excess return in the stock market is higher under Democratic than Republican presidencies: 9 percent for the value-weighted and 16 percent for the equal-weighted portfolio.The difference comes from higher real stock returns and lower real interest rates, is statistically significant, and is robu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The excess return in the stock market is higher under Democratic than Republican presidencies: 9 percent for the value-weighted and 16 percent for the equal-weighted portfolio.The difference comes from higher real stock returns and lower real interest rates, is statistically significant, and is robust in subsamples.The difference in returns is not explained by business-cycle variables related to expected returns, and is not concentrated around election dates. There is no difference in the riskiness of the stock market across presidencies that could justify a risk premium. The difference in returns through the political cycle is therefore a puzzle.

