Results 1 - 10
of
13
Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures
"... The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introduci ..."
Abstract
-
Cited by 33 (13 self)
- Add to MetaCart
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic. 1.
iSAX: Indexing and Mining Terabyte Sized Time Series, SIGKDD. pp
, 2008
"... Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a nove ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a novel multiresolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. Our approach allows both fast exact search and ultra fast approximate search. We show how to exploit the combination of both types of search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing millions of time series.
Discovery of geospatial discriminating patterns from remote sensing datasets
- In SIAM International Conference on Data Mining (SDM
, 2009
"... Large amounts of remotely sensed data calls for data mining techniques to fully utilize their rich information content. In this paper, we study new means of discovery and summarization of knowledge contained in the spatial patterns of remote sensing datasets. Several geospatial feature variables are ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Large amounts of remotely sensed data calls for data mining techniques to fully utilize their rich information content. In this paper, we study new means of discovery and summarization of knowledge contained in the spatial patterns of remote sensing datasets. Several geospatial feature variables are fused together, and the vector of their values at each spatial cell is considered as a transaction to be used in association analysis. The concept of emerging patterns is applied to ascertain the variables that exert dominant influence on the distribution of a selected class variable. A new value-iteration method is introduced to optimally split the spatial domain of the selected variable into two classes. This division is used to calculate the set of patterns that are emerging with respect to the two classes; these patterns are the controlling factors—they are responsible for the spatial distribution of the class variable. A method for a concise summarization of controlling factors is introduced using a similarity measure that is custom-made for the type of patterns stemmed from remote sensing measurements. Using such a similarity measure, controlling factors are clustered providing brief description of different manners, in which the class variable is constrained by the explanatory variables. We evaluate our method in a real-world application pertaining to the density of vegetation within the continental United States. Examination of patterns related to the high vegetation cover provides a summary of data dependencies that helps to develop a better empirical model of the vegetation growth.
Land Cover Change Detection: A Case Study
"... The study of land cover change is an important problem in the Earth Science domain because of its impacts on local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Most well-known change detection techniques from statistics, signal proce ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The study of land cover change is an important problem in the Earth Science domain because of its impacts on local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Most well-known change detection techniques from statistics, signal processing and control theory are not well-suited for the massive high-dimensional spatio-temporal data sets from Earth Science due to limitations such as high computational complexity and the inability to take advantage of seasonality and spatio-temporal autocorrelation inherent in Earth Science data. In our work, we seek to address these challenges with new change detection techniques that are based on data mining approaches. Specifically, in this paper we have performed a case study for a new change detection technique for the land cover change detection problem. We study land cover change in the state of California, focusing on the San Francisco Bay Area and perform an extended study on the entire state. We also perform a comparative evaluation on forests in the entire state. These results demonstrate the utility of data mining techniques for the land cover change detection problem.
Discovery of Changes from the Global Carbon Cycle and Climate System Using Data Mining
- NASA Earth Science Technology Conference
, 2004
"... Abstract — The goal of our NASA sponsored project, “Discovery ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract — The goal of our NASA sponsored project, “Discovery
Predicting Land Temperature Using Ocean Data
"... To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth’s oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eig ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth’s oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eigenvalue analysis techniques, such as principal components analysis (PCA) and singular value decomposition (SVD), to discover climate indices. Recently, an alternative clustering-based methodology has been developed for identifying climate indices. This paper presents preliminary work evaluating the effectiveness of Sea Surface Temperature (SST) and Sea Level Pressure (SLP) cluster-based indices in predicting land temperature and their relative performance with respect to known climate indices. As part of
New Generation Of Data Mining Applications New Generation Of Data Mining Applications
"... Contents 1 Discovery of Patterns in Earth Science Data Using Data Mining 1 P. Zhang, M. Steinbach, V. Kumar, S. Shekhar P. Tan S. Klooster and C. Potter 1.1 Introduction 1 1.2 Data Description and Data Sources 3 1.3 Data Preprocessing 4 1.4 Clustering 5 1.5 Association Analysis 8 1.6 Query Proce ..."
Abstract
- Add to MetaCart
Contents 1 Discovery of Patterns in Earth Science Data Using Data Mining 1 P. Zhang, M. Steinbach, V. Kumar, S. Shekhar P. Tan S. Klooster and C. Potter 1.1 Introduction 1 1.2 Data Description and Data Sources 3 1.3 Data Preprocessing 4 1.4 Clustering 5 1.5 Association Analysis 8 1.6 Query Processing 11 1.7 Other Techniques 13 1.8 Conclusions 15 References 17 Appendix: List of Tables 22 Appendix: List of Figures 24 Discovery of Patterns in Earth Science Data Using Data Mining Pusheng Zhang, Michael Steinbach, Vipin Kumar, and Shashi Shekhar Department of Computer Science & Engineering, University of Minnesota Pang-Ning Tan Department of Computer Science & Engineering, Michigan State University Steven Klooster and Christopher Potter NASA Ames Research Center 1.1 INTRODUCTION NASA's Earth Observing System (EOS) consists of a series of satellites that generate global observations of the land surface, biosphere, solid Earth, atmosphere, and oceans. This remote sensing data, co
Predicting Land Temperature Using Ocean Data
"... To analyze the e#ect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth's oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eige ..."
Abstract
- Add to MetaCart
To analyze the e#ect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth's oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eigenvalue analysis techniques, such as principal components analysis (PCA) and singular value decomposition (SVD), to discover climate indices. Recently, an alternative clustering-based methodology has been developed for identifying climate indices. This paper presents preliminary work evaluating the e#ectiveness of Sea Surface Temperature (SST) and Sea Level Pressure (SLP) cluster-based indices in predicting land temperature and their relative performance with respect to known climate indices. As part of our e#ort, we studied the North Atlantic Oscillation (NAO) index, which is known to impact land temperature in the US, and its cluster-based counterpart, which is derived using daily SLP data from the Atlantic Ocean for a 25 year period (1979-2003). We also studied the predictive power of 28 SST clusters that were identified as the most promising clusters derived from monthly SST data for a 41-year period (1958-1998) [14]. These clusters were shown to be similar to well known climate indices in terms of area weighted correlation to global land temperature, and were considered as prime candidates for further evaluation. Our preliminary results are very encouraging. They show that many of the cluster-based indices can outperform known climate indices in predicting anomalies in land temperature for certain parts of the world.
When Is a Brain Like the Planet?*
, 2005
"... Time series of macroscopic quantities that are aggregates of microscopic quantities, with unknown one-many relations between macroscopic and microscopic states, are common in applied sciences, from economics to climate studies. When such time series of macroscopic quantities are claimed to be causal ..."
Abstract
- Add to MetaCart
Time series of macroscopic quantities that are aggregates of microscopic quantities, with unknown one-many relations between macroscopic and microscopic states, are common in applied sciences, from economics to climate studies. When such time series of macroscopic quantities are claimed to be causal, the causal relations postulated are representable by a directed acyclic graph and associated probability distribution— sometimes called a dynamical Bayes net. Causal interpretations of such series imply claims that hypothetical manipulations of macroscopic variables have unambiguous effects on variables “downstream ” in the graph, and such macroscopic variables may be predictably produced or altered even while particular microstates are not. This paper argues that such causal time series of macroscopic aggregates of microscopic processes are the appropriate model for mental causation. 1. Can There Be Mental Causes? All of us talk as if some thoughts cause some actions. We distinguish deliberations that guide a course of action from random thoughts, fantasies, rejected plans, and even intended consequences that are brought about by our intentions but in ways not intended. We say that the causal role of some of our thoughts is part of
2008 IEEE International Conference on Data Mining Workshops Data Mining for Climate Change and Impacts
"... Knowledge discovery from temporal, spatial and spatiotemporal data is critical for climate change science and climate impacts. Climate statistics is a mature area. However, recent growth in observations and model outputs, combined with the increased availability of geographical data, presents new op ..."
Abstract
- Add to MetaCart
Knowledge discovery from temporal, spatial and spatiotemporal data is critical for climate change science and climate impacts. Climate statistics is a mature area. However, recent growth in observations and model outputs, combined with the increased availability of geographical data, presents new opportunities for data miners. This paper maps climate requirements to solutions available in temporal, spatial and spatiotemporal data mining. The challenges result from long-range, long-memory and possibly nonlinear dependence, nonlinear dynamical behavior, presence of thresholds, importance of extreme events or extreme regional stresses caused by global climate change, uncertainty quantification, and the interaction of climate change with the natural and built environments. This paper makes a case for the development of novel algorithms to address these issues, discusses the recent literature, and proposes new directions. An illustrative case study presented here suggests that even relatively simple data mining approaches can provide new scientific insights with high societal impacts.

