Results 1 - 10
of
10
HITON, A Novel Markov Blanket Algorithm for Optimal Variable Selection
, 2003
"... We introduce a novel, sound and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm first induces the Markov Blanket of the variable to be classified or predicted, and then performs search to further reduce the number of variable ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
We introduce a novel, sound and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm first induces the Markov Blanket of the variable to be classified or predicted, and then performs search to further reduce the number of variables. A wide variety of biomedical tasks with different characteristics were used for an empirical evaluation. Namely, (i) bioactivity prediction for drug discovery, (ii) clinical diagnosis of arrhythmias, (iii) bibliographic text categorization, (iv) lung cancer diagnosis from gene expression array dam, and (v) proteomics-based prostate cancer detection. The state-of-the-art algorithms for each domain are selected for baseline comparison. HITON outperforms the baseline algorithms: it selects more than two orders-of-magnitude smaller variable sets in the selected rusks and damsets while achieving the best accuracy. It also reduces the number of variables in the prediction models by 1,000 times relative to the original variable set while improving accuracy.
The WoRLD: Knowledge Discovery from Multiple Distributed Databases
- In Proceedings of Florida Arti Intelligence Research Symposium (FLAIRS-97
, 1997
"... Inductive machine learning offers techniques for discovering new knowledge from business, medical, and scientific databases. Most techniques assume that all the relevant information for discovery has been gathered and assembled into a single table or database. With multiple databases it is possible ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Inductive machine learning offers techniques for discovering new knowledge from business, medical, and scientific databases. Most techniques assume that all the relevant information for discovery has been gathered and assembled into a single table or database. With multiple databases it is possible to combine features from several perspectives and thus move beyond the confines of an ontology that was fixed by the designers of a single database. We introduce WoRLD ("Worldwide Relational Learning Daemon"), a system that uses spreading activation to enable inductive learning from multiple tables in multiple databases spread across the network. We describe the paradigm and the system, provide demonstrations on synthetic data sets, and then replicate two real-world successes of automated discovery. 1 INTRODUCTION Inductive machine learning offers methods for discovering new knowledge from business, medical, and scientific databases. Although the need to learn across multiple tables has bee...
The TETRAD Project: Constraint Based Aids to Causal Model Specification
- MULTIVARIATE BEHAVIORAL RESEARCH
"... ..."
Feature subset selection by genetic algorithms and estimation of distribution algorithms. A case study in the survival of cirrhotic patients treated with TIPS
, 2000
"... The transjugular intrahepatic portosystemic shunt (TIPS) is an interventional treatment for cirrhotic patients with portal hypertension. In the light of our medical staff's experience, the consequences of TIPS are not homogeneous for all the patients and a subgroup dies in the first six months after ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
The transjugular intrahepatic portosystemic shunt (TIPS) is an interventional treatment for cirrhotic patients with portal hypertension. In the light of our medical staff's experience, the consequences of TIPS are not homogeneous for all the patients and a subgroup dies in the first six months after TIPS placement. Actually, there is no risk indicator to identify this subgroup of patients before treatment. An investigation for predicting the survival of cirrhotic patients treated with TIPS is carried out using a clinical database with 107 cases and 77 attributes. Four supervised machine learning classifiers are applied to discriminate between both subgroups of patients. The application of several Feature Subset Selection (FSS) techniques has significantly improved the predictive accuracy of these classifiers and considerably reduced the amount of attributes in the classification models. Among FSS techniques, FSS-TREE, a new randomized algorithm inspired on the new EDA (Estimation of Di...
Rule-Space Search for Knowledge-Based Discovery
- CIIO Working Paper IS 99-012, Stern School of Business
, 1999
"... Because the knowledge discovery process is ill-defined, iterative, and requires intense interaction, algorithm flexibility is crucial. In this paper, we present a straighforward, heuristic generate-and-test search algorithm for knowledge discovery. An analysis of the literature shows that this basic ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Because the knowledge discovery process is ill-defined, iterative, and requires intense interaction, algorithm flexibility is crucial. In this paper, we present a straighforward, heuristic generate-and-test search algorithm for knowledge discovery. An analysis of the literature shows that this basic algorithm underlies many of the systems that have had practical success in data mining and knowledge discovery over the past twenty years. We argue that this search algorithm has persevered because it is flexible and well behaved as background knowledge is introduced in various forms - exactly what is needed to support the ill-defined knowledge discovery process.
TrendFinder: Automated Detection of Alarmable Trends
- Stoch. Proc. Appl
, 2000
"... in partial ful llment oftherequirements for the degree of ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
in partial ful llment oftherequirements for the degree of
Large-Scale Feature Selection Using Markov Blanket Induction For The Prediction Of Protein-Drug Binding
, 2002
"... this paper we empirically evaluate a recently introduced Markov blanket induction algorithm (iterative association Markov blanket - lAMB) for the purpose of large-scale feature selection in the task of finding the optimal subset among 139,351 molecular structural properties that predict binding t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
this paper we empirically evaluate a recently introduced Markov blanket induction algorithm (iterative association Markov blanket - lAMB) for the purpose of large-scale feature selection in the task of finding the optimal subset among 139,351 molecular structural properties that predict binding to thrombin (and thus the potential of these substances as anti-clotting agents). We also develop and evaluate parallel and chunked variants of lAMB for either high-performance parallel computing or for situations where the data exceeds the available RAM. As baseline feature selection comparison methods we use the state-of-the-art support vector machine4ased recursive feature elimination (RFE) algorithm, as well as a basic univariate association filtering (UAF) method. The full set of features as well as the feature subsets chosen by each selection method are used to create KNN, linear SVM, polynomial SVM, RBF SVM, Simple Bayes, and Neural Network classifiers. The area under the ROC curve (AUC) is used as the classification performance metric. The results of these preliminary experiments are very encouraging: the chunked version of lAMB achieves superior classification performance to RFE for 4 out of 6 classifiers. The best overall classification model is achieved by combining IAMB and Simple Bayes; this model outperforms linear and non-linear SVM models in this task. In addition, IAMB's selected feature set is-270 times smaller that RFE. We empirically demonstrate that the parallel versions mn in minutes on a small (14 node) parallel computer cluster. The chunked version runs on a few hours on a single desktop CPU in the massive thrombin task
Feature Subset Selection using probabilistic tree structures. A case study in the survival of cirrhotic patients treated with TIPS
, 2000
"... . The transjugular intrahepatic portosystemic shunt (TIPS) is an interventional treatment for cirrhotic patients with portal hypertension. In the light of our medical staff's experience, the consequences of the TIPS are not homogeneous for all the patients and a subgroup of them dies in the firs ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. The transjugular intrahepatic portosystemic shunt (TIPS) is an interventional treatment for cirrhotic patients with portal hypertension. In the light of our medical staff's experience, the consequences of the TIPS are not homogeneous for all the patients and a subgroup of them dies in the first six months after the TIPS placement. Actually, there is no risk indicator to identify this group, before treatment. An investigation for predicting the survival of cirrhotic patients treated with TIPS is carried out using a clinical database with 107 cases and 77 attributes. Naive-Bayes, C4.5 and CN2 supervised classifiers are applied to identify this group. The application of several Feature Subset Selection (FSS) techniques has significantly improved the predictive accuracy of these classifiers and considerably reduced the amount of attributes in the classification models. Among FSS techniques, FSS-TREE, a new randomized algorithm inspired on the EDA (Estimation of Distribution ...
Aprendizaje Automático de Modelos Gráficos II. Aplicaciones a la Clasificación Supervisada
"... caci'on Supervisada, Redes Bayesianas, Estad'istica, Inteligencia Artificial, Validaci'on, Melanoma 1 Introducci'on La palabra clasificaci'on se usa con distintos significados, de ah'i que me parece conveniente aclarar desde un principio la terminolog'ia a utilizar en el contenido de este art'icul ..."
Abstract
- Add to MetaCart
caci'on Supervisada, Redes Bayesianas, Estad'istica, Inteligencia Artificial, Validaci'on, Melanoma 1 Introducci'on La palabra clasificaci'on se usa con distintos significados, de ah'i que me parece conveniente aclarar desde un principio la terminolog'ia a utilizar en el contenido de este art'iculo. Desde un punto de vista general, podemos distinguir entre la denominada clasificaci'on no supervisada (cluster an'alisis, o reconocimiento de patrones no supervisado) y la clasificaci'on supervisada (reconocimiento de patrones supervisado). La clasificaci'on no supervisada (Figura 1) - v'ease por ejemplo Kaufman y Rousseeuw (1990) [17] - se refiere al proceso de definir clases de objetos. Es decir, partiendo de una colecci'on de N objetos, O 1 ; O 2 ; :::; O i ; ::::; ON , caracterizados por p var

