Results 1 - 10
of
22
From data mining to knowledge discovery in databases
- AI Magazine
, 1996
"... ■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases ..."
Abstract
-
Cited by 215 (0 self)
- Add to MetaCart
■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field. Across a wide variety of fields, data are
Knowledge Discovery and Data Mining: Towards a Unifying Framework
, 1996
"... This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then define the KDD process and basic data mining algorithms, discuss application issues and conclude with an a ..."
Abstract
-
Cited by 108 (0 self)
- Add to MetaCart
This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then define the KDD process and basic data mining algorithms, discuss application issues and conclude with an analysis of challenges facing practitioners in the field. keywords: Knowledge Discovery in Databases (KDD), Data mining, overview article, large databases, automated analysis, issues and challenges in data mining. To appear: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, August 2-4, 1996, AAAI Press. http://wwwaig. jpl.nasa.gov/kdd96 Knowledge Discovery and Data Mining: Towards a Unifying Framework Usama Fayyad Microsoft Research One Microsoft Way Redmond, WA 98052, USA fayyad@microsoft.com Gregory Piatetsky-Shapiro GTE Laboratories, MS 44 Waltham, MA 02154, USA gps@gte.com Padhraic Smyth Information and Computer S...
A Guided Tour through the Data Mining Jungle
"... An important success factor for the field of KDD lies in the development and integration of methods for supporting the construction and execution of KDD processes. Crucial aspects ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
An important success factor for the field of KDD lies in the development and integration of methods for supporting the construction and execution of KDD processes. Crucial aspects
Using a Data Metric for Preprocessing Advice for Data Mining Applications
, 1998
"... This paper describes research that is performed in the course of a project where a methodology for providing user support for KDD processes plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focussus on ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper describes research that is performed in the course of a project where a methodology for providing user support for KDD processes plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focussus on a part of this process. The main issue in this paper is the support of data preprocessing for KDD. We give some insights in the metadata we calculate from a dataset as part of the method for user support. DCT (Data Characteristion Tool) is implemented in a software environment (Clementine). Some examples are given that resulted from running the UGM/DCT (User Guidance Module combined with DCT) on the data.
Data Mining At The Interface Of Computer Science And Statistics
, 2001
"... This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, i ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining. Data mining has attracted considerable attention both in the research and commercial arenas in recent years, involving the application of a variety of techniques from both computer science and statistics. The chapter discusses how computer scientists and statisticians approach data from different but complementary viewpoints and highlights the fundamental differences between statistical and computational views of data mining. In doing so we review the historical importance of statistical contributions to machine learning and data mining, including neural networks, graphical models, and flexible predictive modeling. The primary conclusion is that closer integration of computational methods with statistical thinking is likely to become increasingly important in data mining applications. Keywords: Data mining, statistics, pattern recognition, transaction data, correlation. 1.
From Massive Data Sets to Science Catalogs: Applications and Challenges
, 1995
"... With hardware advances in scientific instruments and data gathering techniques comes the inevitable flood of data that can render traditional approaches to science data analysis severely inadequate. The traditional approach of manual and exhaustive analysis of a data set is no longer feasible for ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
With hardware advances in scientific instruments and data gathering techniques comes the inevitable flood of data that can render traditional approaches to science data analysis severely inadequate. The traditional approach of manual and exhaustive analysis of a data set is no longer feasible for many tasks ranging from remote sensing, astronomy, and atmospherics to medicine, molecular biology, and biochemistry. In this paper we present our views as practitioners engaged in building computational systems to help scientists deal with large data sets. We focus on what we view as challenges and shortcomings of the current state-of-the-art in data analysis in view of the massive data sets that are still awaiting analysis. The presentation is grounded in applications in astronomy, planetary sciences, solar physics, and atmospherics that are currently driving much of our work at JPL. keywords: science data analysis, limitations of current methods, challenges for massive data sets, ...
Information, Divergence and Risk for Binary Experiments
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to cost-sensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating f-divergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Composite Binary Losses
, 2009
"... We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitl ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduce an intrinsic parametrisation of composite binary losses and give a complete characterisation of the relationship between proper losses and “classification calibrated ” losses. We also consider the question of the “best ” surrogate binary loss. We introduce a precise notion of “best ” and show there exist situations where two convex surrogate losses are incommensurable. We provide a complete explicit characterisation of the convexity of composite binary losses in terms of the link function and the weight function associated with the proper loss which make up the composite loss. This characterisation suggests new ways of “surrogate tuning”. Finally, in an appendix we present some new algorithm-independent results on the relationship between properness, convexity and robustness to misclassification noise for binary losses and show that all convex proper losses are non-robust to misclassification noise. 1

