## Discriminative learning of mixture of Bayesian Network Classifiers for sequence classification (2006)

Venue: | In CVPR 06 |

Citations: | 4 - 0 self |

### BibTeX

@INPROCEEDINGS{Kim06discriminativelearning,

author = {Minyoung Kim},

title = {Discriminative learning of mixture of Bayesian Network Classifiers for sequence classification},

booktitle = {In CVPR 06},

year = {2006},

pages = {268--275},

publisher = {CVPR}

}

### OpenURL

### Abstract

A mixture of Bayesian Network Classifiers(BNC) has a potential to yield superior classification and generative performance to a single BNC model. We introduce novel discriminative learning methods for mixtures of BNCs. Unlike a single BNC model where the discriminative learning resorts to a gradient search, we can exploit the properties of a mixture to alleviate the complex learning task. The proposed method adds mixture components recursively via functional gradient boosting while maximizing the conditional likelihood. This method is highly efficient as it reduces to generative learning of a base BNC model on weighed data. The proposed approach is particularly suited to sequence classification problems where the kernels in the base model are usually too complex for effective gradient search. We demonstrate the improved classification performance of the proposed methods in an extensive set of evaluations on time-series sequence data, including human motion classification problems. 1.

### Citations

2308 | A decision-theoretic generalization of online learning and an application to boosting
- Freund, Schapire
- 1997
(Show Context)
Citation Context ... best for generalization. The maximum number of iterations of the boosting algorithms was set as 10. For the algorithms based on the AdaBoost (i.e. BML, MixBML, and MixCML), we applied AdaBoost.M1 of =-=[2]-=-. The test error is shown in Table 2. DTW shows excellent classification performance, however, the drawback of DTW is that the computational complexity is quadratic in the number of sequences. On the ... |

589 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...y, it has been shown that applying a generative model(e.g. a Bayesian Network) to a classification task yields performance comparable to sophisticated discriminative classifiers such as SVMs and C4.5 =-=[4]-=-. A model of this class, the Bayesian Network Classifier(BNC), can be used in a wide range of applications including speech recognition and motion time-series classification [1]. Instead of the tradit... |

564 | Greedy function approximation: A gradient boosting machine
- Friedman
- 2000
(Show Context)
Citation Context ...e boosting algorithm for discriminative mixture learning is based on the functional gradient optimization of convex additive models. While similar gradient approaches have been introduced in the past =-=[3, 11]-=-, they only provided heuristic methods for the component search or did not focus on mixtures of generative models. In [14], a mixture fitting problem, reduced to the joint log-likelihood cost function... |

446 | Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video
- Starner
- 1998
(Show Context)
Citation Context ... disadvantage. In a different set of approaches generative models can be used as prototypes for sequence clusters. The HMM is often used in this context in speech recognition [9], gesture recognition =-=[22, 18]-=-, and gait recognition [5, 1]. The approach in [1] is closely related to the base BNC model in our paper, however it is trained to optimize a non-discriminative ML criteria. Prior approaches to estima... |

430 | Dynamic programming algorithm optimization for spoken word recognition
- Sakoe, Chiba
- 1978
(Show Context)
Citation Context ... x-coordinate of the centroid of the right hand, and the same length 150. This timeseries dataset is a typical example where 1-NN with either a naive Euclidean distance or a DTW with small SakoeChiba =-=[17]-=- band size constraints extremely works well 9 . Our experimental setting is as follows: 20 sequences (10 from each class) was randomly chosen as the training set, and the remaining 180 for the test se... |

397 | Exploiting generative models in discriminative classifiers
- Jaakkola, Haussler
- 1998
(Show Context)
Citation Context ...n used to estimate Euclidean distances between sequences with unequal lengths. Recently in [15], the constrained DTW was proposed to improve the classification performance on the time-series data. In =-=[7]-=- the kernel of two sequences was defined as the inner product of their Fisher scores with respect to the underlying generative model. Even though these methods may achieve high accuracies, the computa... |

145 | Parametric hidden Markov models for gesture recognition
- Wilson, Bobick
- 1999
(Show Context)
Citation Context ... disadvantage. In a different set of approaches generative models can be used as prototypes for sequence clusters. The HMM is often used in this context in speech recognition [9], gesture recognition =-=[22, 18]-=-, and gait recognition [5, 1]. The approach in [1] is closely related to the base BNC model in our paper, however it is trained to optimize a non-discriminative ML criteria. Prior approaches to estima... |

142 |
Functional gradient techniques for combining hypotheses
- Mason, Baxter, et al.
- 1999
(Show Context)
Citation Context ...e boosting algorithm for discriminative mixture learning is based on the functional gradient optimization of convex additive models. While similar gradient approaches have been introduced in the past =-=[3, 11]-=-, they only provided heuristic methods for the component search or did not focus on mixtures of generative models. In [14], a mixture fitting problem, reduced to the joint log-likelihood cost function... |

108 |
A probabilistic distance measure for hidden return models
- Juang, Rabiner
- 1985
(Show Context)
Citation Context ...interpretation is a major disadvantage. In a different set of approaches generative models can be used as prototypes for sequence clusters. The HMM is often used in this context in speech recognition =-=[9]-=-, gesture recognition [22, 18], and gait recognition [5, 1]. The approach in [1] is closely related to the base BNC model in our paper, however it is trained to optimize a non-discriminative ML criter... |

71 | Large scale discriminative training for speech recognition
- Woodland, Povey
- 2000
(Show Context)
Citation Context ... a conditional likelihood(CML) is known to achieve better classification performance. For instance, in speech recognition, the extended Baum-Welch algorithm for HMM kernels is used to approximate CML =-=[23]-=-. Unfortunately, the CML optimization problem is, in general, complex with non-unique Vladimir Pavlovic Department of Computer Science Rutgers University Piscataway, NJ 08854 vladimir@cs.rutgers.edu s... |

60 |
T.: The UCR Time Series Data Mining Archive [http://www.cs.ucr.edu/~eamonn/TSDMA/index.html
- Keogh, Folias
- 2002
(Show Context)
Citation Context ...nerative counterpart. 4.2. Control Chart synthetic data The control chart dataset 5 contains 600 time-series sequences synthetically generated by a statistical process. 5 The dataset is obtained from =-=[10]-=-. Table 2. Test error(%) on Control Chart dataset ML BML MixBML MixCML 9.31 ± 3.79 5.67 ± 2.50 5.51 ± 2.36 4.76 ± 2.16 BoostML BoostCML 1-NN DTW 1-NN Euc. 8.07 ± 2.75 4.11 ± 1.44 2.73 ± 0.93 7.93 ± 1.... |

59 | Making time-series classification more accurate using learned constraints
- Ratanamahatana, Keogh
- 2004
(Show Context)
Citation Context ... which is then delivered to discriminative classifiers such as kNN or SVMs. Dynamic Time Warping(DTW) is often used to estimate Euclidean distances between sequences with unequal lengths. Recently in =-=[15]-=-, the constrained DTW was proposed to improve the classification performance on the time-series data. In [7] the kernel of two sequences was defined as the inner product of their Fisher scores with re... |

40 | Discovering clusters in motion timeseries data
- Alon, Sclaroff, et al.
(Show Context)
Citation Context ... such as SVMs and C4.5 [4]. A model of this class, the Bayesian Network Classifier(BNC), can be used in a wide range of applications including speech recognition and motion time-series classification =-=[1]-=-. Instead of the traditional Maximum Likelihood(ML) learning that fits a BNC model to data, maximizing a conditional likelihood(CML) is known to achieve better classification performance. For instance... |

26 |
Individual Recognition from Periodic Activity Using Hidden Markov Models
- He, Debrunner
- 2000
(Show Context)
Citation Context ...et of approaches generative models can be used as prototypes for sequence clusters. The HMM is often used in this context in speech recognition [9], gesture recognition [22, 18], and gait recognition =-=[5, 1]-=-. The approach in [1] is closely related to the base BNC model in our paper, however it is trained to optimize a non-discriminative ML criteria. Prior approaches to estimation of mixtures of Bayesian ... |

25 | Learning mixtures of DAG models
- Thiesson, Meek, et al.
- 1997
(Show Context)
Citation Context ...ed to the base BNC model in our paper, however it is trained to optimize a non-discriminative ML criteria. Prior approaches to estimation of mixtures of Bayesian Networks have emerged in recent years =-=[21, 16, 12]-=-. Our recursive boosting algorithm for discriminative mixture learning is based on the functional gradient optimization of convex additive models. While similar gradient approaches have been introduce... |

20 | Bobick: Performance analysis of time-distance gait parameters under different speeds
- Tanawongsuwan
- 2003
(Show Context)
Citation Context ... 39 39 39 39 40 39 40 exit no 35 38 38 38 38 38 34 love exit 34 36 35 35 34 34 37 yes no 27 36 36 36 36 35 36 love yes 40 40 40 40 40 40 40 Average 36.5 38.0 38.2 38.2 37.5 37.9 35.2 different speeds =-=[19, 20]-=-. For each of 15 subjects, and for each of 4 different walking speeds(0.7m/s, 1.0m/s, 1.3m/s, 1.6m/s), 3D motion capture data of 22 marked points (as depicted in [19]) were recorded in a sophisticated... |

16 | Efficient discriminative learning of Bayesian network classifiers via boosted augmented naive Bayes
- Jing, Pavlović, et al.
- 2005
(Show Context)
Citation Context ...cation performance, the gradient search makes standard approaches computationally demanding. Instead of working on a single BNC model, one may benefit from an ensemble-based approach. For example, in =-=[8]-=- AdaBoost of [2] is successfully applied to parameter boosting of a set of BNCs to minimize the exponential loss. However, the resulting model is not a generative model which may limit its domain of a... |

8 |
Model-based motion clustering using boosted mixture modeling
- Pavlović
- 2004
(Show Context)
Citation Context ...r, in this paper, we focus on a mixture of BNCs. A mixture model has a potential to yield superior classification performance to a single BNC model, as well as a rich density estimator. For instance, =-=[14]-=- formulated a recursive mixture-based approach to motion clustering. The recursive approach has benefits such as the optimal order estimation and insensitive to the initial parameters. However, its ai... |

5 |
Staged mixture modelling and boosting
- Meek, Thiesson, et al.
- 2002
(Show Context)
Citation Context ...ed to the base BNC model in our paper, however it is trained to optimize a non-discriminative ML criteria. Prior approaches to estimation of mixtures of Bayesian Networks have emerged in recent years =-=[21, 16, 12]-=-. Our recursive boosting algorithm for discriminative mixture learning is based on the functional gradient optimization of convex additive models. While similar gradient approaches have been introduce... |

4 |
Characteristics of Time-Distance Gait Parameters across Speeds
- Tanawongsuwan, Bobick
- 2003
(Show Context)
Citation Context ... 39 39 39 39 40 39 40 exit no 35 38 38 38 38 38 34 love exit 34 36 35 35 34 34 37 yes no 27 36 36 36 36 35 36 love yes 40 40 40 40 40 40 40 Average 36.5 38.0 38.2 38.2 37.5 37.9 35.2 different speeds =-=[19, 20]-=-. For each of 15 subjects, and for each of 4 different walking speeds(0.7m/s, 1.0m/s, 1.3m/s, 1.6m/s), 3D motion capture data of 22 marked points (as depicted in [19]) were recorded in a sophisticated... |

2 |
MMIHMM: maximum mutual information hidden Markov models
- Oliver, Garg
(Show Context)
Citation Context ... cost functional, an appropriate cost model for the classification task. There have been considerable researches that directly learn HMMs discriminatively via the maximum mutual information objective =-=[13, 23]-=-. Its major drawback is the computational overhead due to the gradient-based search. 4. Experiments In this section, we demonstrate the classification performance of the proposed methods. We focus on ... |

2 |
Boosting density estimation, 2002
- Rosset, Segal
(Show Context)
Citation Context ...ed to the base BNC model in our paper, however it is trained to optimize a non-discriminative ML criteria. Prior approaches to estimation of mixtures of Bayesian Networks have emerged in recent years =-=[21, 16, 12]-=-. Our recursive boosting algorithm for discriminative mixture learning is based on the functional gradient optimization of convex additive models. While similar gradient approaches have been introduce... |