Results 1 -
5 of
5
A framework for regional association rule mining in spatial datasets
- In The 6th IEEE International Conference on Data Mining (ICDM
, 2006
"... The immense explosion of geographically referenced data calls for efficient discovery of spatial knowledge. One critical requirement for spatial data mining is the capability to analyze datasets at different levels of granularity. One of the special challenges for spatial data mining is that informa ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
The immense explosion of geographically referenced data calls for efficient discovery of spatial knowledge. One critical requirement for spatial data mining is the capability to analyze datasets at different levels of granularity. One of the special challenges for spatial data mining is that information is usually not uniformly distributed in spatial datasets. Consequently, the discovery of regional knowledge is of fundamental importance for spatial data mining. Unfortunately, most of the current data mining techniques are ill-prepared for discovering regional knowledge. For example, when using traditional association rule mining, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support. This raises the questions on how to measure the interestingness of a set of regions and how to search effectively and efficiently for interesting regions. This paper centers on discovering regional association rules in spatial datasets. In particular, we introduce a novel framework to mine regional association rules relying on a given class structure. A rewardbased regional discovery methodology is introduced, and a divisive, grid-based supervised clustering algorithm is presented that identifies interesting subregions in spatial datasets. Then, an integrated approach is discussed to systematically mine regional rules. The proposed framework is evaluated in a real-world case study that identifies spatial risk patterns from arsenic in Texas water supply. 1.
A Framework for Regional Association Rule Mining and Scoping in Spatial Datasets
"... be inserted by the editor) ..."
Hierarchical, Parameter-Free Community Discovery
"... Abstract. Given a large bipartite graph (like document-term, or userproduct graph), how can we find meaningful communities, quickly, and automatically? We propose to look for community hierarchies, with communities-within-communities. Our proposed method, the Context-specific Cluster Tree (CCT) find ..."
Abstract
- Add to MetaCart
Abstract. Given a large bipartite graph (like document-term, or userproduct graph), how can we find meaningful communities, quickly, and automatically? We propose to look for community hierarchies, with communities-within-communities. Our proposed method, the Context-specific Cluster Tree (CCT) finds such communities at multiple levels, with no user intervention, based on information theoretic principles (MDL). More specifically, it partitions the graph into progressively more refined subgraphs, allowing users to quickly navigate from the global, coarse structure of a graph to more focused and local patterns. As a fringe benefit, and also as an additional indication of its quality, it also achieves better compression than typical, non-hierarchical methods. We demonstrate its scalability and effectiveness on real, large graphs. 1
Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL
"... Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstracti ..."
Abstract
- Add to MetaCart
Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this paper, we investigate techniques to discover the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length (MDL) framework. Extensive empirical tests show that our method is simpler, more general and significantly more accurate than previous methods, and has the important advantage of being essentially parameter-free.
2011 11th IEEE International Conference on Data Mining Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL
"... Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multiresolution abstractions. Choosing the best representation and abstractio ..."
Abstract
- Add to MetaCart
Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multiresolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this paper, we investigate techniques to discover the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length (MDL) framework. Extensive empirical tests show that our method is simpler, more general and significantly more accurate than previous methods, and has the important advantage of being essentially parameter-free.

