Results 1 - 10
of
22
From data mining to knowledge discovery in databases
- AI Magazine
, 1996
"... ■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases ..."
Abstract
-
Cited by 215 (0 self)
- Add to MetaCart
■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field. Across a wide variety of fields, data are
Distance-Based Outliers: Algorithms and Applications
, 2000
"... . This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional a ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
. This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. Existing methods that we have seen for finding outliers can only deal efficiently with two dimensions/attributes of a dataset. In this paper, we study the notion of DB- (Distance- Based) outliers. Specifically, we show that: (i) outlier detection can be done efficiently for large datasets, and for k-dimensional datasets with large values of k (e.g., k 5); and (ii), outlier detection is a meaningful and important knowledge discovery task. First, we present two simple algorithms, both having a complexity of O(kN 2 ), k being the dimensionality and N being the number of objects in the dataset. These algorithms readily support datasets with many more than two attributes. Second, we ...
Constructing Knowledge From Multivariate Spatiotemporal Data: Integrating Geographic Visualization (GVis) with Knowledge Discovery in Database (KDD) Methods
- International Journal of Geographical Information Science
, 1999
"... In this paper, we develop an approach to the process of constructing knowledge through structured exploration of large spatiotemporal data sets. We begin by introducing our problem context and defining both Geographic Visualization (GVis) and Knowledge Discovery in Databases (KDD), the source domain ..."
Abstract
-
Cited by 49 (15 self)
- Add to MetaCart
In this paper, we develop an approach to the process of constructing knowledge through structured exploration of large spatiotemporal data sets. We begin by introducing our problem context and defining both Geographic Visualization (GVis) and Knowledge Discovery in Databases (KDD), the source domains for methods being integrated. Next, we review and compare recent GVis and KDD developments and consider the potential for their integration, emphasizing that an iterative process with user interaction is a central focus for uncovering interesting and meaningful patterns through each. We then introduce an approach to design of an integrated GVis-KDD environment directed to exploration and discovery in the context of spatiotemporal environmental data. The approach emphasizes a matching of GVis and KDD meta-operations. Following description of the GVis and KDD methods that are linked in our prototype system, we present a demonstration of the prototype applied to a typical spatiotemporal datas...
Discovering Spatial Co-location Patterns: A Summary of Results
- Lecture Notes in Computer Science
, 2001
"... Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type fea ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) and applying association rule mining algorithms which use support based pruning. We propose a notion of user-specified neighborhoods in place of transactions to specify groups of items. New interest measures for spatial co-location patterns are proposed which are robust in the face of potentially infinite overlapping neighborhoods. We also propose an algorithm to mine frequent spatial co-location patterns and analyze its correctness, and completeness. We plan to carry out experimental evaluations and performance tuning in the near future.
Mining Recurrent Items in Multimedia with Progressive Resolution Refinement
- In Int. Conf. on Data Engineering (ICDE’2000
, 2000
"... Despite the overwhelming amounts of multimedia data recently generated and the significance of such data, very few people have systematically investigated multimedia data mining. With our previous studies on content-based retrieval of visual artifacts, we study in this paper the methods for mining c ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Despite the overwhelming amounts of multimedia data recently generated and the significance of such data, very few people have systematically investigated multimedia data mining. With our previous studies on content-based retrieval of visual artifacts, we study in this paper the methods for mining content-based associations with recurrent items and with spatial relationships from large visual data repositories. A progressive resolution refinement approach is proposed in which frequent item-sets at rough resolution levels are mined, and progressively, finer resolutions are mined only on the candidate frequent item-sets derived from mining rough resolution levels. Such a multi-resolution mining strategy substantially reduces the overall data mining cost without loss of the quality and completeness of the results.
A Progressive Refinement Approach To Spatial Data Mining
, 1999
"... Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a demanding field since huge amounts of spatial data have been collected in various applications, ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment a ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a demanding field since huge amounts of spatial data have been collected in various applications, ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning. The collected data far exceed people's ability to analyze it. Thus, new and efficient methods are needed to discover knowledge from large spatial databases. The goal of this thesis is to analyze methods for mining of spatial data, and to determine environments in which efficient spatial data mining methods can be implemented. In the spatial data mining process, we use (1) non-spatial properties of the spatial objects and (2) attributes, predicates and functions describing spatial relations between described objects and other features located in the spatial proximity of the described objects. The descriptions are generalized, transformed into predicates, and the discovered knowle...
Scalable Exploratory Data Mining of Distributed Geoscientific Data
, 1996
"... Geoscience studies produce data from various observations, experiments, and simulations at an enormous rate. Exploratory data mining extracts "content information" from massive geoscientific datasets to extract knowledge and provide a compact summary of the dataset. In this paper, we discuss how dat ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Geoscience studies produce data from various observations, experiments, and simulations at an enormous rate. Exploratory data mining extracts "content information" from massive geoscientific datasets to extract knowledge and provide a compact summary of the dataset. In this paper, we discuss how database query processing and distributed object management techniques can be used to facilitate geoscientific data mining and analysis. Some special requirements of large scale geoscientific data mining that are addressed include geoscientific data modeling, parallel query processing, and heterogeneous distributed data access. Introduction A tremendous amount of raw spatio-temporal data is generated as a result of various observations, experiments, and model simulations. For example, NASA EOS expects to produce over 1 TByte of raw data and scientific data products per day by the year 2000, and a 100-year UCLA AGCM simulation (Mechoso et al. 1991) running at a resolution of 1 ffi \Theta1:25...
Spatial Data Mining
, 2003
"... Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful, patterns from large spatial datasets. Extracting interesting and useful patterns from spatial datasets is more di#cult than extracting the corresponding patterns from traditional numeric and ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful, patterns from large spatial datasets. Extracting interesting and useful patterns from spatial datasets is more di#cult than extracting the corresponding patterns from traditional numeric and categorical data due to the complexity of spatial data types, spatial relationships, and spatial autocorrelation. This chapter will discuss some of accomplishments and research needs of spatial data mining in the following categories: location prediction, spatial outlier detection, co-location mining, and clustering.
Qualitative spatial reasoning: extracting and reasoning with spatial aggregates
- AI Magazine
, 2003
"... Reasoning about spatial data is a key task in many applications, including geographic information systems, meteorological and fluid flow analysis, computer-aided design, and protein structure databases. Such applications often require the identification and manipulation of qualitative spatial repres ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Reasoning about spatial data is a key task in many applications, including geographic information systems, meteorological and fluid flow analysis, computer-aided design, and protein structure databases. Such applications often require the identification and manipulation of qualitative spatial representations, for example, to detect whether one “object ” will soon occlude another in a digital image, or to efficiently determine relationships between a proposed road and wetland regions in a geographic data set. Qualitative spatial reasoning (QSR) provides representational primitives (a spatial “vocabulary”) and inference mechanisms for these tasks. This paper first reviews representative work on QSR for data-poor scenarios, where the goal is to design representations that can answer qualitative queries without much numerical information. It then turns to the data-rich case, where the goal is to derive and manipulate qualitative spatial representations that efficiently and correctly abstract important spatial aspects of the underlying data, for use in subsequent tasks. This paper focuses on how a particular QSR system, Spatial Aggregation (SA), can help answer spatial queries for scientific and engineering data sets. A case study application of weather analysis illustrates the effective representation and reasoning supported by both data-poor and data-rich forms of QSR.

