## Troika – An Improved Stacking Schema for Classification Tasks

Citations: | 4 - 4 self |

### BibTeX

@MISC{Menahem_troika–,

author = {Eitan Menahem and Lior Rokach and Yuval Elovici and Deutsche Telekom},

title = {Troika – An Improved Stacking Schema for Classification Tasks},

year = {}

}

### OpenURL

### Abstract

Stacking is a general ensemble method in which a number of base classifiers are combined using one meta-classifier which learns their outputs. Such an approach provides certain advantages: simplicity; performance that is similar to the best classifier; and the capability of combining classifiers induced by different inducers. The disadvantage of stacking is that on multiclass problems, stacking seems to perform worse than other meta-learning approaches. In this paper we present Troika, a new stacking method for improving ensemble classifiers. The new scheme is built from three layers of combining classifiers. The new method was tested on various datasets and the results indicate the superiority of the proposed method to other legacy ensemble schemes, Stacking and StackingC, especially when the classification task consists of more than two classes

### Citations

3339 | Data Mining: Practical machine learning tools and techniques. 2nd Edition - Witten, Frank - 2005 |

3058 | UCI repository of machine learning databases - Blake, Merz - 1998 |

1132 | A Bayesian method for the induction of probabilistic networks from data - Cooper, Herskovits - 1992 |

846 | C4.5: Programs for - Quinlan - 1993 |

592 | Solving multiclass learning problems via error-correcting output codes - Dietterich, Bakiri - 1995 |

576 | An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36:105–142
- Bauer, Kohavi
- 1999
(Show Context)
Citation Context ...2 P a,a+1,3 P a,a+1,4 … P a,a+1,n … … … … … … Spa-k Pa,k,1 Pa,k,2 Pa,k,3 Pa,k,4 … Pa,k,n Class=a? 1 0 1 0 … 0 The volume of each meta-dataset can be computed as follows: Vmeta dataset = ( k −1) * n − =-=(3)-=- where k is the number of classes in the problem’s domain and n is the number of instances in the original dataset. StackingC's dataset volume is a function of the number of base classifiers, l. Each ... |

343 | Estimating continuous distributions in Bayesian classifiers - John, Langley - 1995 |

326 | Statistical comparisons of classifiers over multiple data sets - Demšar |

285 | Bagging, Boosting, and C4.5 - Quinlan - 1996 |

271 | Robust classification for imprecise environments - Provost, Fawcett - 2001 |

206 | Popular ensemble methods : an empirical study - Opitz, Maclin - 1999 |

110 | Generating accurate and diverse members of a neural-network ensemble - Opitz, Shavlik - 1996 |

86 | Issues in stacked generalization - Ting, Witten - 1999 |

78 | Single-layer learning revisited: A stepwise procedure for building and training a neural network - Knerr, Personnaz, et al. - 1990 |

66 | Stacked generalization, Neural Networks 5 - Wolpert - 1992 |

58 | Using correspondence analysis to combine classi - Merz - 1999 |

50 | Is combining classifiers with stacking better than selecting the best one? Machine Learning 255–273 - Dzeroski, Zenko - 2004 |

50 | Task decomposition and module combination based on class relations: a modular neural network for pattern classification - Lu, Ito - 1999 |

36 | BNeural Networks and the Bias- Variance Dilemma - Geman, Bienenstock, et al. - 1992 |

36 | Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error - Zenobi, Cunningham - 2001 |

24 | Classification by pairwise coupling. The annals of statistics - Hastie, Tibshirani - 1998 |

24 | An Evaluation of Grading Classifiers - Seewald, Fürnkranz - 2001 |

21 | How to make stacking better and faster while also taking care of an unknown weakness - Seewald - 2002 |

17 | Bagging predictors”, Machine learning - Breiman |

14 |
A theory of learning classification rules. Doctoral dissertation
- Buntine
- 1990
(Show Context)
Citation Context ..., Pmetak(instj)} This single instance will be fed to the super-classifier, which in turn will produce its prediction, Troika’s final prediction FinalDecision x) ={ P( C | x), P( C | x),..., P( C | )} =-=(5)-=- ( 1 2 K x 235. Evaluation Description 5.1. Experiment setup In this section we specify the conditions under which Troika was tested. Our goal was to create the means for correctly comparing Troika w... |

14 | Using partitioning to speed up specific-togeneral rule induction - Domingos - 1996 |

12 | Pairwise Classification as an Ensemble Technique - Fürnkranz - 2002 |

7 | Pairwise classification and support vector machines - Kreβel - 1998 |

5 | Stacking with multi-response model trees, in - Dzeroski, Zenko - 2002 |

4 | A neural network ensemble method with jittered training data for time series forecasting - Zhang - 2007 |

2 |
A training algorithmfor optimal margin classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...nstance to class1 etc. Using these meta-classifiers predictions, a new instance, for layer3 classifier, is created – the super-instance. SuperInstance = { p ( inst), p ( inst),..., p ( inst), Class)} =-=(4)-=- 0 1 k Each instance in the super-dataset has a corresponding instance in the original dataset. The class attribute of the super-dataset is copied from the corresponding instance of the original datas... |

2 | Minimal Classification Method With Error-Correcting Codes For Multiclass Recognition - Sivalingam, Pandian, et al. - 2005 |

1 |
Methrotra K, Mohan CK, Ranka S. Efficient classification for multiclass problems using modular neural networks
- Anand
(Show Context)
Citation Context ...n any specific class. We can see that as k increases, the percentage of classifiers which may classify correctly is decreasing, and will descend practically to zero: k −1 2 lim k −> ∞ = limk −> ∞ = 0 =-=(1)-=- k( k −1) k 2 The second reason is that in one-against-one binarization we use only instance of two classes –(out of the k possible) while in one-against-all we use all instances. Thus, the number of ... |

1 |
Olcay T Yıldız, Ethem A. Incremental construction of classifier and discriminant ensembles
- Aydın, Murat
- 2009
(Show Context)
Citation Context ...xample number m 20Compared to the Stacking meta-level dataset, the reduction of dimensionality in the dataset of Troika's first combining layer is: n 4* l * Specialist _ dataset _ volume 4 r = = k = =-=(2)-=- 2 stacking _ meta _ level _ dataset − volume k * l * n k We can see that as k, the number of classes in the problem increases, there is a linear to k growth in Stacking’s meta-level dataset’s volume ... |

1 | Evolutionary Design of Code-matrices for Multiclass Problems, Soft Computing for Knowledge Discovery and Data Mining - Lorena, Carvalho |

1 | Maimon (2007), Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science - Cohen, Rokach, et al. |

1 | Suenc C., Data-driven decomposition for multi-class classification - Zhoua, Pengb - 2008 |