## Protein Classification with Multiple Algorithms (2005)

Venue: | 10 th Panhelllenic Conference on Informatics (PCI 2005), P. Bozanis and E.N. Houstis (Eds.), Spring-Verlag, LNCS 3746 |

Citations: | 15 - 4 self |

### BibTeX

@INPROCEEDINGS{Diplaris05proteinclassification,

author = {Sotiris Diplaris and Grigorios Tsoumakas and Pericles A. Mitkas and Ioannis Vlahavas},

title = {Protein Classification with Multiple Algorithms},

booktitle = {10 th Panhelllenic Conference on Informatics (PCI 2005), P. Bozanis and E.N. Houstis (Eds.), Spring-Verlag, LNCS 3746},

year = {2005},

pages = {448--456}

}

### OpenURL

### Abstract

Abstract. Nowadays, the number of protein sequences being stored in central protein databases from labs all over the world is constantly increasing. From these proteins only a fraction has been experimentally analyzed in order to detect their structure and hence their function in the corresponding organism. The reason is that experimental determination of structure is labor-intensive and quite time-consuming. Therefore there is the need for automated tools that can classify new proteins to structural families. This paper presents a comparative evaluation of several algorithms that learn such classification models from data concerning patterns of proteins with known structure. In addition, several approaches that combine multiple learning algorithms to increase the accuracy of predictions are evaluated. The results of the experiments provide insights that can help biologists and computer scientists design high-performance protein classification systems of high quality. 1

### Citations

5232 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ogy [5]. A plethora of algorithms to address this problem have been proposed, by both the artificial intelligence and the pattern recognition communities. Some of the algorithms create decision trees =-=[6,7]-=-, others exploit artificial neural networks [8] or statistical models [9]. An important issue however that remains is which from the multitude of machine learning algorithms to use for training a clas... |

5119 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ... problem have been proposed, by both the artificial intelligence and the pattern recognition communities. Some of the algorithms create decision trees [6,7], others exploit artificial neural networks =-=[8]-=- or statistical models [9]. An important issue however that remains is which from the multitude of machine learning algorithms to use for training a classifier in order to achieve the best results. Th... |

4074 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...d, by both the artificial intelligence and the pattern recognition communities. Some of the algorithms create decision trees [6,7], others exploit artificial neural networks [8] or statistical models =-=[9]-=-. An important issue however that remains is which from the multitude of machine learning algorithms to use for training a classifier in order to achieve the best results. The plot thickens if we also... |

3214 |
Data mining: practical machine learning tools and techniques with java implementations
- IH, Frank
- 2000
(Show Context)
Citation Context ...ese are general-purpose machine learning algorithms spanning several different learning paradigms (instancebased, rules, trees, statistical). They were obtained from the WEKA machine learning library =-=[26]-=-, and used with default parameter settings unless otherwise stated: - DT, the decision table algorithm of Kohavi [27]. - JRip, the RIPPER rule learning algorithm [28]. - PART, the PART rule learning a... |

1101 | Instance-based learning algorithms
- Aha, Kibler, et al.
- 1991
(Show Context)
Citation Context ...[28]. - PART, the PART rule learning algorithm [29].s- J48, the decision tree learning algorithm C4.5 [7], using Laplace smoothing for predicted probabilities. - IBk, the k nearest neighbor algorithm =-=[30]-=-. - K*, an instance-based learning algorithm with entropic distance measure [31]. - NB, the Naive Bayes algorithm [32] using the kernel density estimator rather than assume normal distributions for nu... |

1075 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1999
(Show Context)
Citation Context ...ensity estimator rather than assume normal distributions for numeric attributes. - SMO, the sequential minimal optimization algorithm for training a support vector classifier using polynomial kernels =-=[33]-=-. - RBF, WEKA implementation of an algorithm for training a radial basis function network [34]. The above algorithms were used alone and in conjunction with the following five different classifier com... |

1033 | Fast effective rule induction
- Cohen
- 1995
(Show Context)
Citation Context ...the WEKA machine learning library [26], and used with default parameter settings unless otherwise stated: - DT, the decision table algorithm of Kohavi [27]. - JRip, the RIPPER rule learning algorithm =-=[28]-=-. - PART, the PART rule learning algorithm [29].s- J48, the decision tree learning algorithm C4.5 [7], using Laplace smoothing for predicted probabilities. - IBk, the k nearest neighbor algorithm [30]... |

376 |
The Pfam protein families database. Nucleic Acids Res
- Bateman, Coin, et al.
- 2004
(Show Context)
Citation Context ...e will refer to both profiles and patterns as motifs. Motifs have been widely used for the prediction of a protein’s properties, since the latter are mainly defined by their motifs. Prosite [1], Pfam =-=[2]-=- and Prints [3] are the most common databases where motifs are being recorded. Machine learning (ML) algorithms [4] can offer the most cost effective approach to automated discovery of a priori unknow... |

337 | Estimating continuous distributions in bayesian classifiers
- John, Langley
- 1995
(Show Context)
Citation Context ...ce smoothing for predicted probabilities. - IBk, the k nearest neighbor algorithm [30]. - K*, an instance-based learning algorithm with entropic distance measure [31]. - NB, the Naive Bayes algorithm =-=[32]-=- using the kernel density estimator rather than assume normal distributions for numeric attributes. - SMO, the sequential minimal optimization algorithm for training a support vector classifier using ... |

332 |
A Machine Learning Approach
- Baldi
- 2001
(Show Context)
Citation Context ...rded. Machine learning (ML) algorithms [4] can offer the most cost effective approach to automated discovery of a priori unknown predictive relationships from large data sets in computational biology =-=[5]-=-. A plethora of algorithms to address this problem have been proposed, by both the artificial intelligence and the pattern recognition communities. Some of the algorithms create decision trees [6,7], ... |

322 | Decision combination in multiple classifier systems
- Ho, Hull, et al.
- 1994
(Show Context)
Citation Context ...ning time. In [17,18] the accuracy of the algorithms is estimated locally on a number of examples that surround each test example. Such approaches belong to the family of Dynamic Classifier Selection =-=[19]-=- and use a different algorithm in different parts of the instance space. Two similar, but more complicated approaches that were developed by Merz[20] are Dynamic Selection and Dynamic Weighting. The s... |

206 | Generating accurate rule sets without global optimization
- Witten, Frank
- 1998
(Show Context)
Citation Context ...d with default parameter settings unless otherwise stated: - DT, the decision table algorithm of Kohavi [27]. - JRip, the RIPPER rule learning algorithm [28]. - PART, the PART rule learning algorithm =-=[29]-=-.s- J48, the decision tree learning algorithm C4.5 [7], using Laplace smoothing for predicted probabilities. - IBk, the k nearest neighbor algorithm [30]. - K*, an instance-based learning algorithm wi... |

137 |
Combination of multiple classifiers using local accuracy estimates
- Woods, Kegelmeyer, et al.
- 1997
(Show Context)
Citation Context ... [16], algorithms are ranked based on Data Envelopment Analysis, a multicriteria evaluation technique that can combine various performance metrics, like accuracy, storage space, and learning time. In =-=[17,18]-=- the accuracy of the algorithms is estimated locally on a number of examples that surround each test example. Such approaches belong to the family of Dynamic Classifier Selection [19] and use a differ... |

111 | The power of decision tables
- Kohavi
- 1995
(Show Context)
Citation Context ...es, trees, statistical). They were obtained from the WEKA machine learning library [26], and used with default parameter settings unless otherwise stated: - DT, the decision table algorithm of Kohavi =-=[27]-=-. - JRip, the RIPPER rule learning algorithm [28]. - PART, the PART rule learning algorithm [29].s- J48, the decision tree learning algorithm C4.5 [7], using Laplace smoothing for predicted probabilit... |

82 | Issues in stacked generalization
- Ting, Witten
- 1999
(Show Context)
Citation Context ...new instance appears for classification, the output of all base-level classifiers is first calculated and then propagated to the meta-level classifier, which outputs the final result. Ting and Witten =-=[22]-=- have shown that Stacking works well when meta-instances are formed by probability distributions for each class instead of just a class label. A recent study [11] has shown that Stacking with Multi-Re... |

68 | Meta-learning by landmarking various learning algorithms
- Pfahringer, Bensusan, et al.
- 2000
(Show Context)
Citation Context ...ance on similar learning domains. Several approaches have been proposed for the characterization of learning domain, including general, statistical and informationtheoretic measures [12], landmarking =-=[13]-=-, histograms [14] and model-based data characterizations [15]. Apart from the characterization of each domain, the performance of each learning algorithm on that domain is recorded. When a new domains... |

64 |
Stacked generalization, Neural Networks 5
- Wolpert
- 1992
(Show Context)
Citation Context ...n Weighted Voting, the classification models are not treated equally. Each model is associated with a coefficient (weight), usually proportional to its classification accuracy. Stacked Generalization =-=[21]-=-, also known as Stacking, is a method that combines multiple classifiers by learning a meta-level (or level-1) model that predicts the correct class based on the decisions of the base-level (or level-... |

61 | K*: An instance-based learner using an entropic distance measure
- Cleary, Trigg
- 1995
(Show Context)
Citation Context ...arning algorithm C4.5 [7], using Laplace smoothing for predicted probabilities. - IBk, the k nearest neighbor algorithm [30]. - K*, an instance-based learning algorithm with entropic distance measure =-=[31]-=-. - NB, the Naive Bayes algorithm [32] using the kernel density estimator rather than assume normal distributions for numeric attributes. - SMO, the sequential minimal optimization algorithm for train... |

50 | Ranking learning algorithms: Using ibl and meta-learning on accuracy and time results
- Brazdil, Carlos, et al.
- 2003
(Show Context)
Citation Context ...sed on its performance on similar learning domains. Several approaches have been proposed for the characterization of learning domain, including general, statistical and informationtheoretic measures =-=[12]-=-, landmarking [13], histograms [14] and model-based data characterizations [15]. Apart from the characterization of each domain, the performance of each learning algorithm on that domain is recorded. ... |

49 |
Is Combining Classifiers with Stacking Better than Selecting the Best One
- Dzeroski, Zenko
(Show Context)
Citation Context ...ng set and selects the best one for application on the test set. Although this method is simple, it has been found to be highly effective and comparable to other more complex state-of-the-art methods =-=[11]-=-. Another line of research proposes the selection of a learning algorithm based on its performance on similar learning domains. Several approaches have been proposed for the characterization of learni... |

41 |
The Prosite Database: its status in 2002. Nucleic Acids Res
- Falquet, Pagni, et al.
- 2002
(Show Context)
Citation Context ... models.sWe will refer to both profiles and patterns as motifs. Motifs have been widely used for the prediction of a protein’s properties, since the latter are mainly defined by their motifs. Prosite =-=[1]-=-, Pfam [2] and Prints [3] are the most common databases where motifs are being recorded. Machine learning (ML) algorithms [4] can offer the most cost effective approach to automated discovery of a pri... |

34 |
Noemon: Design, implementation and performance results of an intelligent assistant for classifier selection
- Kalousis, Theoharis
- 1999
(Show Context)
Citation Context ...earning domains. Several approaches have been proposed for the characterization of learning domain, including general, statistical and informationtheoretic measures [12], landmarking [13], histograms =-=[14]-=- and model-based data characterizations [15]. Apart from the characterization of each domain, the performance of each learning algorithm on that domain is recorded. When a new domainsarrives, the perf... |

23 | Dynamical selection of learning algorithms
- Merz
- 1995
(Show Context)
Citation Context ...long to the family of Dynamic Classifier Selection [19] and use a different algorithm in different parts of the instance space. Two similar, but more complicated approaches that were developed by Merz=-=[20]-=- are Dynamic Selection and Dynamic Weighting. The selection of algorithms is based on their local performance, but not around the test instance itself, rather around the meta-instance comprising the p... |

21 | Adaptive selection of image classifiers
- Giacinto, Roli
- 1997
(Show Context)
Citation Context ... [16], algorithms are ranked based on Data Envelopment Analysis, a multicriteria evaluation technique that can combine various performance metrics, like accuracy, storage space, and learning time. In =-=[17,18]-=- the accuracy of the algorithms is estimated locally on a number of examples that surround each test example. Such approaches belong to the family of Dynamic Classifier Selection [19] and use a differ... |

12 |
Prosite: A dictionary of protein sites and patterns (Department de Biochimie Medicale, Universite de
- Bairoch
- 1990
(Show Context)
Citation Context ... fixed set of attributes. A very important issue in the data mining process is the efficient choice of attributes. In our case, protein chains are represented using a proper motif sequence vocabulary =-=[10]-=-. Suppose the vocabulary contains N motifs. Any given protein sequence typically contains a few of these motifs. We encode each sequence as an N-bit binary pattern where the i th bit is 1 if the corre... |

11 | Effective Voting of Heterogeneous Classifiers
- Tsoumakas, Katakis, et al.
- 2004
(Show Context)
Citation Context ...ulti-Response Model Trees as thesmeta-level learning algorithm and probability distributions, is the most accurate heterogeneous classifier combination method of the Stacking family. Selective Fusion =-=[23,24]-=- is a recent method for combining different classification algorithms that exhibits low computational complexity and high accuracy. It uses statistical procedures for the selection of the best subgrou... |

9 | Data – Driven Generation of Decision Trees for Motif – Based Assignment of Protein Sequences to Functional Families
- Wang, Wang, et al.
- 2001
(Show Context)
Citation Context ...ogy [5]. A plethora of algorithms to address this problem have been proposed, by both the artificial intelligence and the pattern recognition communities. Some of the algorithms create decision trees =-=[6,7]-=-, others exploit artificial neural networks [8] or statistical models [9]. An important issue however that remains is which from the multitude of machine learning algorithms to use for training a clas... |

9 |
An integrated concept for multicriteria ranking of data mining algorithms
- Keller, Paterson, et al.
- 2000
(Show Context)
Citation Context ...ranked according to their average performance. In [12], algorithms are ranked based on a measure called Adjusted Ratio of Ratios (ARR), that combines accuracy and learning time of algorithm, while in =-=[16]-=-, algorithms are ranked based on Data Envelopment Analysis, a multicriteria evaluation technique that can combine various performance metrics, like accuracy, storage space, and learning time. In [17,1... |

7 |
PRINT-S: the database formerly known as PRINTS. Nucleic Acids Re
- Attwood, Croning, et al.
- 2000
(Show Context)
Citation Context ... both profiles and patterns as motifs. Motifs have been widely used for the prediction of a protein’s properties, since the latter are mainly defined by their motifs. Prosite [1], Pfam [2] and Prints =-=[3]-=- are the most common databases where motifs are being recorded. Machine learning (ML) algorithms [4] can offer the most cost effective approach to automated discovery of a priori unknown predictive re... |

4 |
I.: Selective Fusion of Heterogeneous Classifiers. Intelligent Data Analysis 9
- Tsoumakas, Angelis, et al.
- 2005
(Show Context)
Citation Context ...ulti-Response Model Trees as thesmeta-level learning algorithm and probability distributions, is the most accurate heterogeneous classifier combination method of the Stacking family. Selective Fusion =-=[23,24]-=- is a recent method for combining different classification algorithms that exhibits low computational complexity and high accuracy. It uses statistical procedures for the selection of the best subgrou... |

2 |
Higher-order Approach to Metalearning. The ECML’2000 workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination
- Bensusan, Giraud-Carrier, et al.
- 2000
(Show Context)
Citation Context ...n proposed for the characterization of learning domain, including general, statistical and informationtheoretic measures [12], landmarking [13], histograms [14] and model-based data characterizations =-=[15]-=-. Apart from the characterization of each domain, the performance of each learning algorithm on that domain is recorded. When a new domainsarrives, the performance of the algorithms in the k-nearest n... |

1 | GenMiner: A Data Mining Tool for Protein Analysis
- Hatzidamianos, Diplaris, et al.
- 2003
(Show Context)
Citation Context ...n that belonged in two or more classes, a new class was created that was named after both the protein classes in which the protein belonged. This resulted to a total of 32 different classes. GenMiner =-=[25]-=- was used for the preparation of data. 3.2 Learning Algorithms and Combination Methods We used 9 different learning algorithms at the base-level. These are general-purpose machine learning algorithms ... |