Results 1  10
of
14
Clustering for EdgeCost Minimization
"... Leonard J. Schulman College of Computing Georgia Institute of Technology Atlanta GA 303320280 ABSTRACT We address the problem of partitioning a set of n points into clusters, so as to minimize the sum, over all intracluster pairs of points, of the cost associated with each pair. We obtain a ra ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Leonard J. Schulman College of Computing Georgia Institute of Technology Atlanta GA 303320280 ABSTRACT We address the problem of partitioning a set of n points into clusters, so as to minimize the sum, over all intracluster pairs of points, of the cost associated with each pair. We obtain a randomized approximation algorithm for this problem, for the cost functions ` 2 2 ; `1 and `2 , as well as any cost function isometrically embeddable in ` 2 2 .
Statistics and Data Mining: Intersecting Disciplines
 SIGKDD Explorations
, 1999
"... is generally meant by data mining nowadays. Statistics and data mining have much in common, but they also have differences. The nature of the two disciplines is examined, with emphasis on their similarities and differences. ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
is generally meant by data mining nowadays. Statistics and data mining have much in common, but they also have differences. The nature of the two disciplines is examined, with emphasis on their similarities and differences.
Data Mining At The Interface Of Computer Science And Statistics
, 2001
"... This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, i ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, involving the application of a variety of techniques from both computer science and statistics. The chapter discusses how computer scientists and statisticians approach data from different but complementary viewpoints and highlights the fundamental differences between statistical and computational views of data mining. In doing so we review the historical importance of statistical contributions to machine learning and data mining, including neural networks, graphical models, and flexible predictive modeling. The primary conclusion is that closer integration of computational methods with statistical thinking is likely to become increasingly important in data mining applications. Keywords: Data mining, statistics, pattern recognition, transaction data, correlation. 1.
An Overview of Temporal Data Mining
"... Temporal Data Mining is a rapidly evolving area of research that is at the intersection of several disciplines, including statistics, temporal pattern recognition, temporal databases, optimisation, visualisation, highperformance computing, and parallel computing. This paper is first intended to ser ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Temporal Data Mining is a rapidly evolving area of research that is at the intersection of several disciplines, including statistics, temporal pattern recognition, temporal databases, optimisation, visualisation, highperformance computing, and parallel computing. This paper is first intended to serve as an overview of the temporal data mining in research and applications.
A Theory Of Empirical Spatial Knowledge Supporting Rough Set Based Knowledge Discovery in Geographic Databases
, 1998
"... This research addresses the problem of obtaining useful knowledge from multiple theme geographic data where the size and complexity of the dataset challenges human comprehension. A theoretical foundation for geographic knowledge discovery in databases is developed commencing with Pawlak’s (1982, 1 ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
This research addresses the problem of obtaining useful knowledge from multiple theme geographic data where the size and complexity of the dataset challenges human comprehension. A theoretical foundation for geographic knowledge discovery in databases is developed commencing with Pawlak’s (1982, 1991) theory of abstract knowledge. Pawlak’s theory is founded on notions of equivalence relations and classification. These ideas are combined with the wellknown mathematical concepts of set definition by extension and by intension to develop the concepts of extensional knowledge (i.e facts) and intensional knowledge (e.g. rules). The theory traverses the concepts of generalisation, specialisation, induction, deduction, unsupervised learning and supervised learning. Further considerations lead to proposing that empirical objects are dependent upon, and a consequence of, an intelligent agent’s sensory experience of realworld phenomena. The a priori existence of objects in nature is rejected. A theory of empirical spatial knowledge is developed, based on the idea that spatial experience of reality is fundamentally dependent upon the spatial configuration of the sensors of the sensing entity. It is shown that the spatial relationships
Formal Logics of Discovery and Hypothesis Formation By Machine
"... . The following are the aims of the paper: (1) To call the attention of the community of Discovery Science to certain existing formal systems for DS developed in Prague in 60's till 80's suitable for DS and unfortunately largely unknown. (2) To illustrate the use of the calculi in question on the ex ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
. The following are the aims of the paper: (1) To call the attention of the community of Discovery Science to certain existing formal systems for DS developed in Prague in 60's till 80's suitable for DS and unfortunately largely unknown. (2) To illustrate the use of the calculi in question on the example of the GUHA method of hypothesis generation by computer, subjecting this method to a critical evaluation in the context of contemporary data mining. (3) To stress the importance of Fuzzy Logic for DS and inform on the present state of mathematical foundations of Fuzzy Logic. (4) Finally, to present a running research program of developing calculi of symbolic fuzzy logic for DS and for a fuzzy GUHA method. 1 Introduction The term "logic of discovery" is admittedly not new: let us mention at least Popper's philosophical work [42], Buchanan's dissertation [4] analyzing the notion of a logic of discovery in relation to Artificial Intelligence and Plotkin's paper [41] with his notion of a ...
GigaMining
, 1998
"... We describe an industrialstrength data mining application in telecommunications. The application requires building a short (7 byte) profile for all telephone numbers seen on a large telecom network. By large, we mean very large: we maintain approximately 350 million profiles. In addition, the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We describe an industrialstrength data mining application in telecommunications. The application requires building a short (7 byte) profile for all telephone numbers seen on a large telecom network. By large, we mean very large: we maintain approximately 350 million profiles. In addition, the procedure for updating these profiles is based on processing approximately 275 million call records per day. We discuss the motivation for massive tracking and fully describe the definition and the computation of one of the more interesting bytes in the profile.
Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks
"... Abstract — The healthcare environment is generally perceived as being ‘information rich ’ yet ‘knowledge poor’. There is a wealth of data available within the healthcare systems. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. Knowledge disco ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract — The healthcare environment is generally perceived as being ‘information rich ’ yet ‘knowledge poor’. There is a wealth of data available within the healthcare systems. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. Knowledge discovery and data mining have found numerous applications in business and scientific domain. Valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, we briefly examine the potential use of classification based data mining techniques such as Rule based, Decision tree, Naïve Bayes and Artificial Neural Network to massive volume of healthcare data. The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not “mined ” to discover hidden information. For data preprocessing and effective decision making One Dependency Augmented Naïve Bayes classifier (ODANB) and naive credal classifier 2 (NCC2) are used. This is an extension of naive Bayes to imprecise probabilities that aims at delivering robust classifications also when dealing with small or incomplete data sets. Discovery of hidden patterns and relationships often goes unexploited. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease, to be established.
Data Mining: Data Analysis on a Grand Scale?
 Statistical Methods in Medical Research
, 2000
"... Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of "data owners" in extracting useful information from massive observational data sets. Because of this historical context, data mining to date has largely focused on computational and algori ..."
Abstract
 Add to MetaCart
Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of "data owners" in extracting useful information from massive observational data sets. Because of this historical context, data mining to date has largely focused on computational and algorithmic issues rather than the more traditional statistical aspects of data analysis. This paper provides a brief review of the origins of data mining as well as discussing some of the primary themes in current research in data mining, including scalable algorithms for massive data sets, discovering novel patterns in data, and analysis of text, Web, and related multimedia data sets. 1 Introduction The phrase "data mining" has had a varied history within the past 30 to 40 years. In the 1960's, as digital computers were beginning to be applied to data analysis problems, it was noticed that if one searched long enough (using the computer) that one could always find some relatively complex ...
Automatic Aggregation using Explicit Metadata
, 2000
"... The paper presents a logical data model for statistical data with an explicit modelization of metadata, which allows to perform automatic aggregation. The data are stored in standard relations from the relational model, while the metadata, defining the semantics of the relations, are represented by ..."
Abstract
 Add to MetaCart
The paper presents a logical data model for statistical data with an explicit modelization of metadata, which allows to perform automatic aggregation. The data are stored in standard relations from the relational model, while the metadata, defining the semantics of the relations, are represented by a sort of functional dependencies for summary attributes, called numerical dependencies, which specify the function defining the summary values in terms of microdata, as well as the interrelationships among summary values. Relations with numerical dependencies can be considered as statistical views over initial relations over microdata which are generally not accessible. The main benefit of the present model is to support declarative queries, that do not need to bother with the intricacy of...