Results 1 - 10
of
13
The Representation Race - Preprocessing for Handling Time Phenomena
- In Ramon Lopez de Mantaras and Enric Plaza, editors, Machine Learning: ECML 2000, Lecture Notes in Artificial Intelligence
, 2000
"... . Designing the representation languages for the input,LE , and output, LH , of a learning algorithm is the hardest task within machine learning applications. This paper emphasizes the importance of constructing an appropriate representation LE for knowledge discovery applications using the exam ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
. Designing the representation languages for the input,LE , and output, LH , of a learning algorithm is the hardest task within machine learning applications. This paper emphasizes the importance of constructing an appropriate representation LE for knowledge discovery applications using the example of time related phenomena. Given the same raw data -- most frequently a database with time-stamped data -- rather different representations have to be produced for the learning methods that handle time. In this paper, a set of learning tasks dealing with time is given together with the input required by learning methods which solve the tasks. Transformations from raw data to the desired representation are illustrated by three case studies. 1 Introduction Designing the representation languages for the input and output of a learning algorithm is the hardest task within machine learning applications. The "no free lunch theorem" actually implies that if a hard learning task becomes e...
An Intelligent Assistant for the Knowledge Discovery Process
, 2001
"... A knowledge discovery (KD) process involves preprocessing data, choosing a data-mining algorithm, and postprocessing the mining results. There are very many choices for each of these stages, and non-trivial interactions between them. Consequently, both novices and data-mining specialists need a ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
A knowledge discovery (KD) process involves preprocessing data, choosing a data-mining algorithm, and postprocessing the mining results. There are very many choices for each of these stages, and non-trivial interactions between them. Consequently, both novices and data-mining specialists need assistance in navigating the space of possible KD processes. We present the concept of Intelligent Discovery Assistants (IDAs), which provide users with (i) systematic enumerations of valid KD processes, so important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of KD processes to execute. We use a prototype to show that an IDA can indeed provide useful enumerations and effective rankings.
Coordinating Agent Activities in Knowledge Discovery Processes
- IN INT’L JOINT CONF. ON WORK ACTIVITIES COORDINATION AND COLLABORATION
, 1999
"... Knowledge discovery in databases (KDD) is an increasingly widespread activity. KDD processes may entail the use of a large number of data manipulation and analysis techniques, and new techniques are being developed on an ongoing basis. A challenge for the effective use of KDD is coordinating the use ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Knowledge discovery in databases (KDD) is an increasingly widespread activity. KDD processes may entail the use of a large number of data manipulation and analysis techniques, and new techniques are being developed on an ongoing basis. A challenge for the effective use of KDD is coordinating the use of these techniques, which may be highly specialized, conditional and contingent. Additionally, the understanding and validity of KDD results can depend critically on the processes by which they were derived. We propose to use process programming to address the coordination of agents in the use of KDD techniques. We illustrate this approach using the process language Little-JIL to program a representative bivariate regression process. With Little-JIL programs we can clearly capture the coordination of KDD activities, including control flow, pre- and post-requisites, exception handling, and resource usage.
Using a Data Metric for Preprocessing Advice for Data Mining Applications
, 1998
"... This paper describes research that is performed in the course of a project where a methodology for providing user support for KDD processes plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focussus on ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper describes research that is performed in the course of a project where a methodology for providing user support for KDD processes plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focussus on a part of this process. The main issue in this paper is the support of data preprocessing for KDD. We give some insights in the metadata we calculate from a dataset as part of the method for user support. DCT (Data Characteristion Tool) is implemented in a software environment (Clementine). Some examples are given that resulted from running the UGM/DCT (User Guidance Module combined with DCT) on the data.
Concepts for Reuse in the Experience Factory and Their Implementation for CBR-System Development
- In Proceedings of the Eleventh German Workshop on Machine Learning (FGML-98). http://demolab.iese.fhg.de:8080/Publications/fgml98
, 1998
"... . An Experience Factory is an infrastructure for organizational learning in software development that includes an Experience Base as an organizational memory. We introduce a system architecture how such an infrastructure can be technically supported based on Case-Based Reasoning (CBR) technology. As ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
. An Experience Factory is an infrastructure for organizational learning in software development that includes an Experience Base as an organizational memory. We introduce a system architecture how such an infrastructure can be technically supported based on Case-Based Reasoning (CBR) technology. As a first instantiation of this architecture we present the CBR-PEB application, a publicly accessible WWW-based experience base for CBR-system development. Based on first experiments, some results about the evaluation of the success of this application are described. Keywords. Case-based reasoning, continuous learning from experience, experimental software engineering, experience factory, experience base, GQM-based measurement 1 Introduction In software development, reuse is regarded as a means to handle today's increasing quality, productivity, and time-to-market requirements [BCR94a, GFW94]. But reuse is not just a by-product of software development. To support reuse one has to establish...
A Process Model for Developing Inductive Applications
- PROCEEDINGS OF THE SEVENTH BELGIAN-DUTCH CONFERENCE ON MACHINE LEARNING
, 1997
"... A growing interest in real-world applications of inductive techniques signifies the need for methodologies for applying them. So far a number of methodologies for applying inductive learning techniques are described. After reviewing several published approaches, a number of unsolved problems are ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
A growing interest in real-world applications of inductive techniques signifies the need for methodologies for applying them. So far a number of methodologies for applying inductive learning techniques are described. After reviewing several published approaches, a number of unsolved problems are discussed, two major problems being the lackofattention to nontechnical issues and the focus of most approaches on specific, well defined problems with a limited scope. We propose the MeDIA-model as a reference structure for the application of inductive learning techniques that covers the issues mentioned in other approaches and generalises from problem specific approaches. The model is part of a methodology that aims at supporting the application of inductive learning techniques in various settings, and helps to plan projects where suchtechniques are involved.
Benutzerunterstützung eines KDD-Prozesses anhand von Datencharakteristiken
- University Berlin
, 1998
"... Dieser Artikel beschreibt die Forschung im Rahmen eines Projektes, bei dem die Bereitstellung einer Benutzerunterstützung innerhalb des KDD-Prozesses die zentrale Rolle spielt. Obwohl es prinzipiell das Ziel ist, den ganzen Prozess zu unterstützen, befaßt sich dieser Artikel lediglich mit einem Teil ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Dieser Artikel beschreibt die Forschung im Rahmen eines Projektes, bei dem die Bereitstellung einer Benutzerunterstützung innerhalb des KDD-Prozesses die zentrale Rolle spielt. Obwohl es prinzipiell das Ziel ist, den ganzen Prozess zu unterstützen, befaßt sich dieser Artikel lediglich mit einem Teil davon. Der Schwerpunkt liegt hierbei auf der Datenvorverarbeitungs-Phase, wobei ebenfalls ein erster Ansatz fur die Data Mining-Phase vorgestellt wird. Die Unterstützung basiert auf Datencharakteristiken (Meta-Daten), die einen vorliegenden Datensatz möglichst genau beschreiben und von dem innerhalb einer Softwareumgebung (Clementine) implementierten DCT (Data Characterisation Tool) berechnet werden. Wir stellen diese Maße vor und geben einige Beispiele fur die Benutzerunterstützung aufgrund konkreter Datensätze.
Support for Data Transformation in Machine Learning Applications
, 1998
"... This paper describes research that is performed in the course of a project where a methodology for providing user support plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focussus on support of the dat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes research that is performed in the course of a project where a methodology for providing user support plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focussus on support of the data preprocessing phase and getting insight in the data. One of our experiences is that preprocessing of data possibly is the most time consuming part of machine learning applications. We will rudimentary describe the metadata we calculate from a dataset as part of the method for user support and focus on how metadata can be used to guide preprocessing in combination with a top down approach. Some examples are given that resulted from running the UGM/DCT (User Guidance Module/Data Characterisation Tool) on example data. Finally we consider the improvements we made w.r.t. other approaches as well as what we gained using this extension to our User Guidance Module (UGM) for user support.
Data Mining In Direct Marketing Databases
, 1998
"... this paper we give a basic introduction in the application of data mining to direct marketing. Best practices for data selection, algorithm selection and evaluation of results are described and illustrated with a number of real world cases. We suggest two lines of research which we consider importan ..."
Abstract
- Add to MetaCart
this paper we give a basic introduction in the application of data mining to direct marketing. Best practices for data selection, algorithm selection and evaluation of results are described and illustrated with a number of real world cases. We suggest two lines of research which we consider important to put data mining in the hands of the marketeer: automating data mining techniques and integration of data mining in an open knowledge management framework.
Specification of Pre-Processing Operators Requirements
, 2000
"... In this workpackage a specification of the pre-processing operations needed in order to analyse the available data warehouses is described. The collection of pre-processing requirements will serve as the unifying and organizing basis for all other workpackages, which shall provide methods for satisf ..."
Abstract
- Add to MetaCart
In this workpackage a specification of the pre-processing operations needed in order to analyse the available data warehouses is described. The collection of pre-processing requirements will serve as the unifying and organizing basis for all other workpackages, which shall provide methods for satisfying each of them. Based on partners' previous experience, those aspects of data selection and preparation that have proved to be most critical to the success in previous data mining activities have been identified, as well as the application context in which they arose.

