Results 1 
6 of
6
Toward Intelligent Assistance for a Data Mining Process: An OntologyBased Approach for CostSensitive Classification
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... For more information, please visit our website at ..."
(Show Context)
Generating data analysis programs from statistical models
 J. Functional Programming
, 2002
"... ..."
An Ensemble Approach to Building Mercer Kernels with Prior Information," presented at
 IEEE Systems Manand Cybernetics Conference Workshop on Ensemble Methods in Extreme Environments
, 2005
"... Abstract — This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly infinite dimensional feature space. We describe a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract — This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly infinite dimensional feature space. We describe a new method called Mixture Density Mercer Kernels to learn kernel function directly from data, rather than using predefined kernels. These data adaptive kernels can encode prior knowledge in the kernel using a Bayesian formulation, thus allowing for physical information to be encoded in the model. Specifically, we demonstrate the use of the algorithm in situations with extremely small samples of data. We compare the results with existing algorithms on data from the Sloan Digital Sky Survey (SDSS) and demonstrate the method’s superior performance against standard methods. The code for these experiments has been generated with the AUTOBAYES tool, which automatically generates efficient and documented C/C++ code from abstract statistical model specifications. The core of the system is a schema library which contains templates for learning and knowledge discovery algorithms like different versions of EM, or numeric optimization methods like conjugate gradient methods. The template instantiation is supported by symbolicalgebraic computations, which allows AUTOBAYES to find closedform solutions and, where possible, to integrate them into the code. The results show that the Mixture Density Mercer Kernel described here outperforms treebased classification in distinguishing highredshift galaxies from lowredshift galaxies by approximately 16 % on test data, bagged trees by approximately 7%, and bagged trees built on a much larger sample of data by approximately 2%. I.
AutoBayes: A System for Synthesizing Data Analysis Programs
, 2000
"... Introduction Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are wellfounded but difficult to implement: the development of a statistical data analysis program for any given application is timeconsuming and requires knowledge and experien ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Introduction Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are wellfounded but difficult to implement: the development of a statistical data analysis program for any given application is timeconsuming and requires knowledge and experience in several areas. AutoBayes [BFP99, FSP00] is a fully automatic highlevel generator system for data analysis programs from statistical models which aims to overcome these barriers. AutoBayes follows the schemabased deductive approach to program synthesis. This means that the programs are constructed by the instantiation of generic algorithm schemas, e.g., EM. This process is supported by logicbased deduction to ensure the consistency between specification (i.e., statistical model) and synthesized program. AutoBayes uses a textual notation which is based on graphical models (more precisely, Bayesian networks) to specify the statist
on Knowledge and Data Engineering Towards Intelligent Assistance for a Data Mining Process: An Ontologybased Approach for Costsensitive Classification 1
"... A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a datamining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large ..."
Abstract
 Add to MetaCart
A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a datamining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and datamining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype Intelligent Discovery Assistant (IDA), which provides users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a comprehensive demonstration of costsensitive classification using a more involved process and data from the 1998 KDDCUP competition.
Under consideration for publication in J. Functional Programming 1 AutoBayes: A System for Generating Data Analysis Programs from Statistical Models
"... Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are wellfounded but difficult to implement: the development of a stati ..."
Abstract
 Add to MetaCart
(Show Context)
Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are wellfounded but difficult to implement: the development of a statistical data analysis program for any given application is timeconsuming and requires substantial knowledge and experience in several areas. In this paper, we describe AutoBayes, a program synthesis system for the generation of data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e., observation or parameter) and its dependencies in the form of a probability distribution. It is a fully declarative problem description, similar in spirit to a set of differential equations. From such a model, AutoBayes generates optimized and fully commented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Code is produced by a schemaguided deductive synthesis process. A schema consists of a code template and applicability constraints which are checked against the model during synthesis using theorem proving technology. AutoBayes augments schemaguided synthesis by symbolicalgebraic computation and can thus derive closedform solutions for many problems. It is wellsuited for tasks like estimating bestfitting model parameters for the given data. Here, we describe AutoBayes’s system architecture, in particular the schemaguided synthesis kernel. Its capabilities are illustrated by a number of advanced textbook examples and benchmarks. 1