Results 1 -
8 of
8
HITON, A Novel Markov Blanket Algorithm for Optimal Variable Selection
, 2003
"... We introduce a novel, sound and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm first induces the Markov Blanket of the variable to be classified or predicted, and then performs search to further reduce the number of variable ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
We introduce a novel, sound and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm first induces the Markov Blanket of the variable to be classified or predicted, and then performs search to further reduce the number of variables. A wide variety of biomedical tasks with different characteristics were used for an empirical evaluation. Namely, (i) bioactivity prediction for drug discovery, (ii) clinical diagnosis of arrhythmias, (iii) bibliographic text categorization, (iv) lung cancer diagnosis from gene expression array dam, and (v) proteomics-based prostate cancer detection. The state-of-the-art algorithms for each domain are selected for baseline comparison. HITON outperforms the baseline algorithms: it selects more than two orders-of-magnitude smaller variable sets in the selected rusks and damsets while achieving the best accuracy. It also reduces the number of variables in the prediction models by 1,000 times relative to the original variable set while improving accuracy.
Feature subset selection by genetic algorithms and estimation of distribution algorithms. A case study in the survival of cirrhotic patients treated with TIPS
, 2000
"... The transjugular intrahepatic portosystemic shunt (TIPS) is an interventional treatment for cirrhotic patients with portal hypertension. In the light of our medical staff's experience, the consequences of TIPS are not homogeneous for all the patients and a subgroup dies in the first six months after ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
The transjugular intrahepatic portosystemic shunt (TIPS) is an interventional treatment for cirrhotic patients with portal hypertension. In the light of our medical staff's experience, the consequences of TIPS are not homogeneous for all the patients and a subgroup dies in the first six months after TIPS placement. Actually, there is no risk indicator to identify this subgroup of patients before treatment. An investigation for predicting the survival of cirrhotic patients treated with TIPS is carried out using a clinical database with 107 cases and 77 attributes. Four supervised machine learning classifiers are applied to discriminate between both subgroups of patients. The application of several Feature Subset Selection (FSS) techniques has significantly improved the predictive accuracy of these classifiers and considerably reduced the amount of attributes in the classification models. Among FSS techniques, FSS-TREE, a new randomized algorithm inspired on the new EDA (Estimation of Di...
F.: PCX: Markov blanket classification for large data sets with few cases
, 2004
"... Data sets with many discrete variables and relatively few cases arise in many domains. Several studies have sought to identify the Markov Blanket (MB) of a target variable by filtering variables using statistical decisions for conditional independence and then applying a classifier using the MB pred ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Data sets with many discrete variables and relatively few cases arise in many domains. Several studies have sought to identify the Markov Blanket (MB) of a target variable by filtering variables using statistical decisions for conditional independence and then applying a classifier using the MB predictors. Other studies have applied the PC algorithm or heuristic procedures, to estimate a DAG model of the MB and classify by Bayesian updating. The PC output is not a DAG or MB, and how a DAG representation of the MB is formed in these studies is not specified. Using a filter from the HITON feature selection procedure, we find a Markov equivalence class using the PC algorithm, provide an explicit algorithm for converting the output to a graphical Markov Blanket, and classify by Bayesian updating. We apply this procedure (PCX) to five empirical data sets from different domains, and compare it with results from HITON, which applies several state-of-the-art classifiers. The PCX classifier has fewer variables than those found by the HITON procedure, and gives comparable classification accuracy while supplying insight into possible causal relations among the variables.
Bayesian ANN Classifier for ECG Arrhythmia Diagnostic System: A Comparison Study
- Proc. of Int. Joint Conf. on Neural Networks
, 2005
"... Abstract — This paper outlines a system for detection of cardiac arrhythmias within ECG signals, based on a Bayesian Artificial Neural Network (ANN) classifier. The Bayesian (or Probabilistic) ANN Classifier is built by the use of a logistic regression model and the back propagation algorithm based ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — This paper outlines a system for detection of cardiac arrhythmias within ECG signals, based on a Bayesian Artificial Neural Network (ANN) classifier. The Bayesian (or Probabilistic) ANN Classifier is built by the use of a logistic regression model and the back propagation algorithm based on a Bayesian framework. Its performance for this task is evaluated by comparison with other classifiers including Naive Bayes, Decision Trees, Logistic Regression, and RBF Networks. A paired t-test is employed in comparing classifiers to select the optimum model. The system is evaluated using noisy ECG data, to simulate a realworld environment. It is hoped that the system can be further developed and fine-tuned for practical application. I.
Arrhythmia Identification from ECG Signals with a Neural Network Classifier Based on a Bayesian Framework
"... This paper presents a diagnostic system for cardiac arrhythmias from ECG data, using an Artificial Neural Network (ANN) classifier based on a Bayesian framework. The Bayesian ANN Classifier is built by the use of a logistic regression model and the back propagation algorithm. A dual threshold method ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a diagnostic system for cardiac arrhythmias from ECG data, using an Artificial Neural Network (ANN) classifier based on a Bayesian framework. The Bayesian ANN Classifier is built by the use of a logistic regression model and the back propagation algorithm. A dual threshold method is applied to determine the diagnosis strategy and suppress false alarm signals. The experimental results presented in this paper show that more than 90 % prediction accuracy may be obtained using the improved methods in the study. It is hoped that the system can be further developed and fine-tuned for practical application. 1
A Classification Learning Algorithm
"... Presence of irrelevant features is a fact of life in many realworld applications of classification learning. Although nearest-neighbor classification algorithms have emerged as a promising approach to machine learning tasks with their high predictive accuracy, they are adversely affected by the ..."
Abstract
- Add to MetaCart
Presence of irrelevant features is a fact of life in many realworld applications of classification learning. Although nearest-neighbor classification algorithms have emerged as a promising approach to machine learning tasks with their high predictive accuracy, they are adversely affected by the presence of such irrelevant features. In this paper, we describe a recently proposed classification algorithm called VFI5, which achieves comparable accuracy to nearest-neighbor classifiers while it is robust with respect to irrelevant features. The paper compares both the nearest-neighbor classifier and the VFI5 algorithms in the presence of irrelevant features on both artificially generated and real-world data sets selected from the UCI repository.
Large-scale attribute selection using wrappers
"... Abstract—Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cr ..."
Abstract
- Add to MetaCart
Abstract—Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, althoughwrapperevaluators tendtoachievesuperior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed “optimal ” subset size. We show that this technique reduces subset size while maintaining comparable accuracy. I.
HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection
"... We introduce a novel, sound, sample-efficient, and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm works by inducing the Markov Blanket of the variable to be classified or predicted. A wide variety of biomedical tasks with di ..."
Abstract
- Add to MetaCart
We introduce a novel, sound, sample-efficient, and highly-scalable algorithm for variable selection for classification, regression and prediction called HITON. The algorithm works by inducing the Markov Blanket of the variable to be classified or predicted. A wide variety of biomedical tasks with different characteristics were used for an empirical evaluation. Namely, (i) bioactivity prediction for drug discovery, (ii) clinical diagnosis of arrhythmias, (iii) bibliographic text categorization, (iv) lung cancer diagnosis from gene expression array data, and (v) proteomics-based prostate cancer detection. State-of-the-art algorithms for each domain were selected for baseline comparison. Results: (1) HITON reduces the number of variables in the prediction models by three orders of magnitude relative to the original variable set while improving or maintaining accuracy. (2) HITON outperforms the baseline algorithms by selecting more than two orders-ofmagnitude smaller variable sets than the baselines, in the selected tasks and datasets.

