Results 1  10
of
286
Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences
 In SIGMOD
, 1997
"... Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access. In this paper we consider a very large dataset com ..."
Abstract

Cited by 108 (15 self)
 Add to MetaCart
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access. In this paper we consider a very large dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical value. We show how to compress such a dataset into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed. Experiments on large, real world datasets (AT&T customer calling patterns) show that the proposed method achieves an average of less than 5% error in any data value after compressing to a mere 2.5% of the original space (i.e., a 40:1 compression ratio), with these numbers not very sensitive to dataset size. Experiments on aggregate queries achieved a 0.5% reconstruction error with a space requirement under 2%. 1 Introduction The bulk of the data...
A: Surrogate time series
 Physica D
"... Abstract Before we apply nonlinear techniques, for example those inspired by chaos theory, to dynamical phenomena occurring in nature, it is necessary to first ask if the use of such advanced techniques is justified by the data. While many processes in nature seem very unlikely a priori to be linear ..."
Abstract

Cited by 97 (0 self)
 Add to MetaCart
Abstract Before we apply nonlinear techniques, for example those inspired by chaos theory, to dynamical phenomena occurring in nature, it is necessary to first ask if the use of such advanced techniques is justified by the data. While many processes in nature seem very unlikely a priori to be linear, the possible nonlinear nature might not be evident in specific aspects of their dynamics. The method of surrogate data has become a very popular tool to address such a question. However, while it was meant to provide a statistically rigorous, foolproof framework, some limitations and caveats have shown up in its practical use. In this paper, recent efforts to understand the caveats, avoid the pitfalls, and to overcome some of the limitations, are reviewed and augmented by new material. In particular, we will discuss specific as well as more general approaches to constrained randomisation, providing a full range of examples. New algorithms will be introduced for unevenly sampled and multivariate data and for surrogate spike trains. The main limitation, which lies in the interpretability of the test results, will be illustrated through instructive case studies. We will also discuss some implementational aspects of the realisation of these methods in the TISEAN software package. PACS 05.45.+b, Keywords: time series, surrogate data, nonlinearity
NeuroAnimator: Fast Neural Network Emulation and Control of PhysicsBased Models
, 1998
"... Animation through the numerical simulation of physicsbased graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physicsbased models to produce desired animations usually entails formidable computational cost. This paper de ..."
Abstract

Cited by 89 (3 self)
 Add to MetaCart
Animation through the numerical simulation of physicsbased graphics models offers unsurpassed realism, but it can be computationally demanding. Likewise, finding controllers that enable physicsbased models to produce desired animations usually entails formidable computational cost. This paper demonstrates the possibility of replacing the numerical simulation and control of model dynamics with a dramatically more efficient alternative. In particular, we propose the NeuroAnimator, a novel approach to creating physically realistic animation that exploits neural networks. NeuroAnimators are automatically trained offline to emulate physical dynamics through the observation of physicsbased models in action. Depending on the model, its neural network emulator can yield physically realistic animation one or two orders of magnitude faster than conventional numerical simulation. Furthermore, by exploiting the network structure of the NeuroAnimator, we introduce a fast algorithm for learning controllers that enables either physicsbased models or their neural network emulators to synthesize motions satisfying prescribed animation goals. We demonstrate NeuroAnimators for passive and active (actuated) rigid body, articulated, and deformable physicsbased models.
Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics
, 1996
"... We present a method for the unsupervised segmentation of data streams originating from different unknown sources which alternate in time. We use an architecture consisting of competing neural networks. Memory is included in order to resolve ambiguities of inputoutput relations. In order to obtain m ..."
Abstract

Cited by 71 (21 self)
 Add to MetaCart
We present a method for the unsupervised segmentation of data streams originating from different unknown sources which alternate in time. We use an architecture consisting of competing neural networks. Memory is included in order to resolve ambiguities of inputoutput relations. In order to obtain maximal specialization, the competition is adiabatically increased during training. Our method achieves almost perfect identification and segmentation in the case of switching chaotic dynamics where input manifolds overlap and inputoutput relations are ambiguous. Only a small dataset is needed for the training proceedure. Applications to time series from complex systems demonstrate the potential relevance of our approach for time series analysis and shortterm prediction. 1 Introduction Neural networks provide frameworks for the representation of relations present in data. Especially in the fields of classification and time series prediction, neural networks Corresponding author, email:k...
Online Data Mining for CoEvolving Time Sequences
 In Proceedings of the 16th International Conference on Data Engineering
, 2000
"... In many applications, the data of interest comprises multiple sequences that evolve over time. Examples include currency exchange rates, network traffic data. We develop a fast method to analyze such coevolving time sequences jointly to allow (a) estimation/forecasting of missing /delayed/future v ..."
Abstract

Cited by 66 (4 self)
 Add to MetaCart
In many applications, the data of interest comprises multiple sequences that evolve over time. Examples include currency exchange rates, network traffic data. We develop a fast method to analyze such coevolving time sequences jointly to allow (a) estimation/forecasting of missing /delayed/future values, (b) quantitative data mining,and (c) outlier detection. Our method, MUSCLES, adapts to changing correlations among time sequences. It can handle indefinitely long sequences efficiently using an incremental algorithm and requires only small amount of storage and less I/O operations. To make it scale for a large number of sequences, we present a variation, the Selective MUSCLES method and propose an efficient algorithm to reduce the problem size. Experiments on real datasets show that MUSCLES outperforms popular competitors in prediction accuracy up to 10 times, and discovers interesting correlations. Moreover, Selective MUSCLES scales up very well for large numbers of sequences, reducing response time up to 110 times over MUSCLES, and sometimes even improves the prediction quality.
On the OutofSample Importance of Skewness and Asymmetric Dependence for Asset Allocation
 Journal of Financial Econometrics
, 2004
"... Recent studies in the empirical finance literature have reported evidence of two types of asymmetries in the joint distribution of stock returns. The first is skewness in the distribution of individual stock returns. The second is an asymmetry in the dependence between stocks: stock returns appear t ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
Recent studies in the empirical finance literature have reported evidence of two types of asymmetries in the joint distribution of stock returns. The first is skewness in the distribution of individual stock returns. The second is an asymmetry in the dependence between stocks: stock returns appear to be more highly correlated during market downturns than during market upturns. In this article we examine the economic and statistical significance of these asymmetries for asset allocation decisions in an outofsample setting. We consider the problem of a constant relative risk aversion (CRRA) investor allocating wealth between the riskfree asset, a smallcap portfolio, and a largecap portfolio. We use models that can capture timevarying moments up to the fourth order, and we use copula theory to construct models of the timevarying dependence structure that allow for different dependence during bear markets than bull markets. The importance of these two asymmetries for asset allocation is assessed by comparing the performance of a portfolio based on a normal distribution model with a portfolio based on a more flexible distribution model. For investors with no shortsales constraints, we find that knowledge of higher moments and asymmetric dependence leads to gains that are economically significant and statistically significant in some cases. For short salesconstrained investors the gains are limited.
Interdisciplinary application of nonlinear time series methods
 Phys. Reports
, 1998
"... This paper reports on the application to field measurements of time series methods developed on the basis of the theory of deterministic chaos. The major difficulties are pointed out that arise when the data cannot be assumed to be purely deterministic and the potential that remains in this situatio ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
This paper reports on the application to field measurements of time series methods developed on the basis of the theory of deterministic chaos. The major difficulties are pointed out that arise when the data cannot be assumed to be purely deterministic and the potential that remains in this situation is discussed. For signals with weakly nonlinear structure, the presence of nonlinearity in a general sense has to be inferred statistically. The paper reviews the relevant methods and discusses the implications for deterministic modeling. Most field measurements yield nonstationary time series, which poses a severe problem for their analysis. Recent progress in the detection and understanding of nonstationarity is reported. If a clear signature of approximate determinism is found, the notions of phase space, attractors, invariant manifolds etc. provide a convenient framework for time series analysis. Although the results have to be interpreted with great care, superior performance can be achieved for typical signal processing tasks. In particular, prediction and filtering of signals are discussed, as well as the classification of system states by means of time series recordings.
Searching for Structure in Multiple Streams of Data
 In Proceedings of the Thirteenth International Conference on Machine Learning
, 1996
"... Finding structure in multiple streams of data is an important problem. Consider the streams of data flowing from a robot's sensors, the monitors in an intensive care unit, or periodic measurements of various indicators of the health of the economy. There is clearly utility in determining ..."
Abstract

Cited by 54 (9 self)
 Add to MetaCart
Finding structure in multiple streams of data is an important problem. Consider the streams of data flowing from a robot's sensors, the monitors in an intensive care unit, or periodic measurements of various indicators of the health of the economy. There is clearly utility in determining howcurrent and past values in those streams are related to future values. We formulate the problem of finding structure in multiple streams of categorical data as searchover the space of dependencies, unexpectedly frequentor infrequent cooccurrences, between complex patterns of values that can appear in the streams. Based on that formulation, we develop the MultiStream Dependency Detection (msdd) algorithm that performs an efficient systematic searchover the space of all possible dependencies. Dependency strength is evaluated with a statistical measure of nonindependence, and bounds that we derivefor the value of that measure allow the search to be pruned. Due to the pruning, can find the k strongest dependencies in the streams by examining only a fraction of the search space.
ConstrainedRealization MonteCarlo method for Hypothesis Testing
 Physica D
"... : We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data s ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
: We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data sets that are "typical realizations" of that model. The second approach concentrates not on the model but directly on the original data; it attempts to constrain the surrogate data sets so that they exactly agree with the original data for a specified set of sample statistics. Examples of these two approaches are provided for two simple cases: a test for deviations from a gaussian distribution, and a test for serial dependence in a time series. Additionally, we consider tests for nonlinearity in time series based on a Fourier transform (FT) method and on more conventional autoregressive movingaverage (ARMA) fits to the data. The comparative performance of hypothesis testing schemes based on these two approaches...