## Boosting Diverse Learners for Domain Agnostic Time Series Classification

### BibTeX

@MISC{Minnen_boostingdiverse,

author = {David Minnen and Peng Zang and Charles Isbell and Thad Starner},

title = {Boosting Diverse Learners for Domain Agnostic Time Series Classification},

year = {}

}

### OpenURL

### Abstract

Although most classification methods benefit from the incorporation of domain knowledge, some situations call for a single algorithm that applies to a wide range of diverse domains. In such cases, the techniques and biases that prove useful in one domain may be irrelevant or even harmful in another. This paper addresses the problem of constructing a domain agnostic time series classification algorithm that allows safe inclusion of domain-specific methods that may be highly effective in some domains yet detrimental in others. Our approach combines MBoost, an extension to AdaBoost that allows robust boosting of multiple weak learners, with SAMME, a multiclass extension of AdaBoost which does not rely on a reduction to a set of binary problems. The resulting algorithm allows the safe and efficient combination of multiple learning algorithms for multiclass classification. 1.

### Citations

4288 | A tutorial on hidden Markov models and selected applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...el to induce a distribution over time series. A common choice is the hidden Markov model (HMM), which can be thought of as a switching-state mixture model (see Rabiner’s tutorial for more information =-=[12]-=-). We experimented with fullyconnected HMMs with ten states and Gaussian output distributions relying on the standard Baum-Welch algorithm for parameter estimation. The Viterbi algorithm was used to c... |

3468 | Libsvm: a library for support vector machines. Sofware Available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm, 2001. Search Model Selection for SVM
- Chang, Lin
(Show Context)
Citation Context ...ication [6]. Typically, a kernel is used to implicitly project the data into a high-dimensional space where linear classification is more likely to be effective. Our implementation is based on libSVM =-=[5]-=- and uses a simple grid search to find good values for the slack variable and RBF kernel scale parameter. Since our implementation of the SVM only supports binary classification, we adopt a one-vs-one... |

2329 | A decision-theoretic generalization of on-line learning and an application to boosting
- Freund, Schapire
- 1997
(Show Context)
Citation Context ...periments onsseveral benchmark UCI data sets show MBoost performs at least as well as any single model, any boosted single model, and sometimes outperforms them both [18]. MBoost is based on AdaBoost =-=[15, 8]-=-, an ensemble learning technique that iteratively constructs an ensemble of hypotheses by applying a weak learner repeatedly on different distributions over data. In AdaBoost, distributions are chosen... |

945 |
An Introduction to Support Vector Machines
- Cristianini, Shawe-Taylor
- 2000
(Show Context)
Citation Context ...ier returns m arg max i=1 wi, where ωi is the class label associated with ω each wi. Support Vector Machines Support Vector Machines (SVMs) are large-margin classifiers based on linear classification =-=[6]-=-. Typically, a kernel is used to implicitly project the data into a high-dimensional space where linear classification is more likely to be effective. Our implementation is based on libSVM [5] and use... |

708 | Probabilistic outputs for support vector machines and comparison to regularized likelihood methods
- Platt
- 1999
(Show Context)
Citation Context ...one scheme to generate a multiclass composite classifier. Here, however, we learn a mapping for each binary classifier from the margin to a probabilistic confidence value using Platt’s sigmoid method =-=[11]-=- and then use the confidence values for weighted voting. Model-based Maximum Likelihood The generative approach to classification learns a probabilistic model from the training examples for each class... |

666 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...line methods in Section 5, and then we discuss the results in Section 6. 2. MBOOST Ensemble learning methods have been empirically shown to be more powerful than any single method alone [4]. Boosting =-=[14]-=- is a particularly popular ensemble technique with strong theoretical and empirical support. For our time series classification algorithm, we use MBoost [18], an ensemble algorithm designed for boosti... |

123 |
A decision theoretic generalization of on-line learning and an application to boosting
- Freund, Schapire
- 1995
(Show Context)
Citation Context ...enabling us to take advantage of the empirical power of using multiple models, even those that can be brittle. 3. MULTICLASS ADABOOST The original AdaBoost algorithm introduced by Fruend and Schapire =-=[7]-=- combined multiple, weighted hypotheses from a single binary classification algorithm. Several approaches to boost multiclass classifiers have been proposed, but these methods typically rely on a redu... |

116 |
Discovery and segmentation of activities in video
- Brand, Kettnaker
- 2000
(Show Context)
Citation Context ...learning the model structure, estimating the proper number of states, using different output distributions, using different duration models, and using different parameter estimation algorithms (e.g., =-=[2, 12, 3, 16, 1]-=-). Including some of these more sophisticate methods would likely improve results, but further evaluation is needed to verify this intuition and measure the trade-off between improved accuray and run ... |

93 | A Hybrid Discriminative/Generative Approach for Modelling Human Activities
- Lester, Choudhury, et al.
(Show Context)
Citation Context ...sembles 1D ensembles use AdaBoost to combine a set of decision stumps in order to classify data. The method was introduced by Yin et al. [17] and extended to use probabilistic voting by Lester et al. =-=[10]-=-. These models act likessoft decision trees, essentially selecting a different feature and splitting location during each round of boosting. The key insight that led to the development of this techniq... |

79 |
Improved boosting using confidence-rated predictions
- Schapire, Singer
- 1999
(Show Context)
Citation Context ...periments onsseveral benchmark UCI data sets show MBoost performs at least as well as any single model, any boosted single model, and sometimes outperforms them both [18]. MBoost is based on AdaBoost =-=[15, 8]-=-, an ensemble learning technique that iteratively constructs an ensemble of hypotheses by applying a weak learner repeatedly on different distributions over data. In AdaBoost, distributions are chosen... |

58 | Ensemble selection from libraries of models
- Caruana, Niculescu-Mizil, et al.
- 2004
(Show Context)
Citation Context ...are it to baseline methods in Section 5, and then we discuss the results in Section 6. 2. MBOOST Ensemble learning methods have been empirically shown to be more powerful than any single method alone =-=[4]-=-. Boosting [14] is a particularly popular ensemble technique with strong theoretical and empirical support. For our time series classification algorithm, we use MBoost [18], an ensemble algorithm desi... |

30 | What HMMs can do
- Bilmes
- 2006
(Show Context)
Citation Context ...learning the model structure, estimating the proper number of states, using different output distributions, using different duration models, and using different parameter estimation algorithms (e.g., =-=[2, 12, 3, 16, 1]-=-). Including some of these more sophisticate methods would likely improve results, but further evaluation is needed to verify this intuition and measure the trade-off between improved accuray and run ... |

24 | Multiclass adaboost
- Zhu, Rosset, et al.
- 2005
(Show Context)
Citation Context ...C(x) = arg max y m=1 w i P ni=1 w i α (m) · I(T (m) (x) = y) The “stagewise additive modeling using a multiclass exponential loss function” (SAMME) algorithm developed by Zhu, Rosset, Zou, and Hastie =-=[19]-=- achieves the goal of direct multiclass boosting. Zhu et al. provide a statistical justification for their modification to the original AdaBoost algorithm by noting the relationship to the exponential... |

17 |
Three myths about dynamic time warping
- Ratanamahatana, Keogh
(Show Context)
Citation Context ...ics. In all of our experiments, the L1 metric was used. Previous investigations have shown that unconstrained DTW will often lead to pathological warping, which can reduce the accuracy of this metric =-=[13]-=-. Methods to constrain the warping path include the Sakoe-Chiba band, the Itakura parallelogram, and concatenation of local constraints. In an effort to support a wide range of domains yet still benef... |

12 |
Rehg, “Asymmetrically boosted HMM for speech reading
- Pei, Essa, et al.
- 2004
(Show Context)
Citation Context ...e also used by the 1D ensemble classifier described next. 1D Ensembles 1D ensembles use AdaBoost to combine a set of decision stumps in order to classify data. The method was introduced by Yin et al. =-=[17]-=- and extended to use probabilistic voting by Lester et al. [10]. These models act likessoft decision trees, essentially selecting a different feature and splitting location during each round of boosti... |

11 | Fast state discovery for HMM model selection and learning
- Siddiqi, Gordon, et al.
- 2007
(Show Context)
Citation Context ...learning the model structure, estimating the proper number of states, using different output distributions, using different duration models, and using different parameter estimation algorithms (e.g., =-=[2, 12, 3, 16, 1]-=-). Including some of these more sophisticate methods would likely improve results, but further evaluation is needed to verify this intuition and measure the trade-off between improved accuray and run ... |

9 | A discriminative training algorithm for hidden Markov models
- Yishai, Burshtein
- 2004
(Show Context)
Citation Context |

1 |
http://www.cs.ucr.edu/ eamonn/time series data
- Keogh, Xi, et al.
- 2007
(Show Context)
Citation Context ...iers with only a subset of the features. 5. EMPIRICAL EVALUATION To evaluate our time series classification algorithm, we used the 20 data sets provided on the UCR Time Series Classification web page =-=[9]-=-. Figure 1 shows the mean error rate of our algorithm along with error bars representing one standard deviation (blue), the minimum error achieved on any single run (red), and the best error rate for ... |

1 | Managing domain knowledge and multiple models with boosting
- Zang, Isbell
- 2007
(Show Context)
Citation Context ...than any single method alone [4]. Boosting [14] is a particularly popular ensemble technique with strong theoretical and empirical support. For our time series classification algorithm, we use MBoost =-=[18]-=-, an ensemble algorithm designed for boosting multiple weak learners. MBoost provides two primary advantages. First, it explicitly supports multiple weak learners and formalizes the notion of using th... |