## Fast state discovery for HMM model selection and learning (2007)

### Cached

### Download Links

Venue: | In Proc. Int’l Conference on Artificial Intelligence and Statistics |

Citations: | 11 - 3 self |

### BibTeX

@INPROCEEDINGS{Siddiqi07faststate,

author = {Sajid M. Siddiqi and Geoffrey J. Gordon and Andrew W. Moore},

title = {Fast state discovery for HMM model selection and learning},

booktitle = {In Proc. Int’l Conference on Artificial Intelligence and Statistics},

year = {2007}

}

### OpenURL

### Abstract

Choosing the number of hidden states and their topology (model selection) and estimating model parameters (learning) are important problems for Hidden Markov Models. This paper presents a new state-splitting algorithm that addresses both these problems. The algorithm models more information about the dynamic context of a state during a split, enabling it to discover underlying states more effectively. Compared to previous top-down methods, the algorithm also touches a smaller fraction of the data per split, leading to faster model search and selection. Because of its efficiency and ability to avoid local minima, the state-splitting approach is a good way to learn HMMs even if the desired number of states is known beforehand. We compare our approach to previous work on synthetic data as well as several real-world data sets from the literature, revealing significant improvements in efficiency and test-set likelihoods. We also compare to previous algorithms on a sign-language recognition task, with positive results. 1

### Citations

650 | UCI repository of machine learning databases. Downloadable at http://www.ics.uci.edu/ ∼mlearn/MLRepository. html - Hettich, Blake, et al. - 1998 |

363 |
A Tutorial on
- Rabiner
- 1989
(Show Context)
Citation Context ...idden Markov Models (HMMs) are a popular tool for modeling the statistical properties of such sequences, which are ubiquitous in the real world. HMMs have been used extensively in speech recognition (=-=Rabiner, 1989-=-), bioinformatics (Krogh et al., 1994), information extraction (Seymore et al., 1999) and other areas. There has been extensive work on learning the parameters of a fixed-topology HMM. Several algorit... |

279 | Visual recognition of American Sign Language using Hidden Markov models
- Starner, Pentland
- 1995
(Show Context)
Citation Context ... is carried out by scoring a test sequence with each HMM, and the sequence is labeled with the class of the highest-scoring HMM. One such classification problem is automatic signlanguage recognition (=-=Starner & Pentland, 1995-=-). We test the effectiveness of our automatically learned HMMs at classification of Australian sign language using the AUSL dataset (Kadous, 2002). The data consists of sensor readings from a pair of ... |

154 | Learning hidden markov model structure for information extraction
- Seymore, McCallum, et al.
- 1999
(Show Context)
Citation Context ...operties of such sequences, which are ubiquitous in the real world. HMMs have been used extensively in speech recognition (Rabiner, 1989), bioinformatics (Krogh et al., 1994), information extraction (=-=Seymore et al., 1999-=-) and other areas. There has been extensive work on learning the parameters of a fixed-topology HMM. Several algorithms Geoffrey J. Gordon Machine Learning Department Carnegie Mellon University Pittsb... |

152 |
Estimating the dimension of a model. Annals of Statistics
- Schwarz
- 1978
(Show Context)
Citation Context ...hood P(O,Q | λ), where Q is a sequence of hidden states that corresponds to the observed data sequence. To determine the stopping point for state-splitting, we use the Bayesian Information Criterion (=-=Schwarz, 1978-=-), or BIC score, which asymptotically converges to the true posterior probability of the model assuming an uninformative prior. 3 Related Work There has been extensive work on HMM model selection. How... |

91 | Best-first model merging for hidden Markov model induction
- Stolcke, Omohundro
- 1947
(Show Context)
Citation Context ...igation is needed in this area to see if these two things hold true in other sequence classification domains. Previous work for finding the dimensionality of discrete-valued hidden variables in HMMs (=-=Stolcke & Omohundro, 1994-=-) and other Bayesian Networks (Elidan & Friedman, 2001) has demonstrated that hardupdates model selection algorithms can yield much greater efficiency than soft-updates methods without a large loss of... |

63 |
A hidden markov model that finds genes in e. coli dna
- Krogh, Haussler
- 1994
(Show Context)
Citation Context ... popular tool for modeling the statistical properties of such sequences, which are ubiquitous in the real world. HMMs have been used extensively in speech recognition (Rabiner, 1989), bioinformatics (=-=Krogh et al., 1994-=-), information extraction (Seymore et al., 1999) and other areas. There has been extensive work on learning the parameters of a fixed-topology HMM. Several algorithms Geoffrey J. Gordon Machine Learni... |

51 | A data-driven approach to quantifying natural human motion
- Ren, Patrick, et al.
- 2005
(Show Context)
Citation Context ...at has appeared in previous work on sequential data models. These are Robot, Mocap, Mlog, AUSL and Vowel. Robot (Howard & Roy, 2003) contains laser readings from a Pioneer robot moving indoors. Mocap(=-=Ren et al., 2005-=-) contains motion capture data from people performing various actions.AUSL and Vowel are from the UCI KDD archive (D.J. NewSTACS V-STACS ML-SSS Baum-Welch N = 40 -1.75 -1.76 -1.80 -1.78 11790s 1875s 1... |

50 |
HMM topology design using maximum likelihood successive state splitting”. Computer Speech and Language
- Ostendorf, Singer
- 1997
(Show Context)
Citation Context ...ning two candidates with full EM is also inefficient especially when they may not be the best candidates, as our empirical evaluations will show. ML-SSS: Maximum-Likelihood Successive-StateSplitting (=-=Ostendorf & Singer, 1997-=-) is designed to learn HMMs that model contextual (observation density) and temporal (transition model) variation in phones for continuous speech recognition systems. ML-SSS incrementally builds an HM... |

33 |
The robotics data set repository (radish
- Howard, Roy
- 2003
(Show Context)
Citation Context ...L -4.40 -4.41 -4.44 -4.41 548s 189s 1009s 1285s We choose a range of real-world data that has appeared in previous work on sequential data models. These are Robot, Mocap, Mlog, AUSL and Vowel. Robot (=-=Howard & Roy, 2003-=-) contains laser readings from a Pioneer robot moving indoors. Mocap(Ren et al., 2005) contains motion capture data from people performing various actions.AUSL and Vowel are from the UCI KDD archive (... |

33 | Temporal Classification: Extending the Classification Paradigm to Multivariate Time Series, A Thesis Submitted as a Requirement for the Degree of Doctor of Philosophy 2002, Order Number: AAI0806481
- Kadous
(Show Context)
Citation Context ...lem is automatic signlanguage recognition (Starner & Pentland, 1995). We test the effectiveness of our automatically learned HMMs at classification of Australian sign language using the AUSL dataset (=-=Kadous, 2002-=-). The data consists of sensor readings from a pair of Flock instrumented gloves, for 27 instances each of 95 distinct words. Each instance is roughly 55 timesteps. We retained the (x,y,z,roll,pitch,y... |

24 | Learning the dimensionality of hidden variables
- Elidan, Friedman
- 2001
(Show Context)
Citation Context ...s hold true in other sequence classification domains. Previous work for finding the dimensionality of discrete-valued hidden variables in HMMs (Stolcke & Omohundro, 1994) and other Bayesian Networks (=-=Elidan & Friedman, 2001-=-) has demonstrated that hardupdates model selection algorithms can yield much greater efficiency than soft-updates methods without a large loss of accuracy. To our knowledge, however, this is the firs... |

21 | Temporal pattern generation using hidden Markov model based unsupervised classification - Li, Biswas - 1999 |

19 |
Acoustic Modeling of Phoneme Units for Continuous Speech Recognition
- Ney
- 1990
(Show Context)
Citation Context ...) task, but we can afford this expense since we only have to do it for one candidate model. In V-STACS , the likelihood is approximated by the Viterbi path likelihood as in the Viterbi approximation (=-=Ney, 1990-=-). This avoids O(TN 2 ) operations in V-STACS since the Viterbi path can be updated efficiently after split design. 5 Experiments In our experiments we seek to compare STACS and V-STACS to Li-Biswas, ... |

13 | Fast inference and learning in large-state-space hmms
- Siddiqi, Moore
- 2005
(Show Context)
Citation Context ...699s 4613s 25156s 60035s -4.30 -4.31 -4.44 -4.31 8296s 2714s 4407s 13360s man & Merz, 1998). We also use AUSL for measuring classification accuracy. Mlog was previously used in a paper on large HMMs (=-=Siddiqi & Moore, 2005-=-). 5.2 Learning HMMs of Predetermined Size We first evaluate performance in learning models of predetermined size. In Table 1 we show test-set loglikelihoods normalized by data set size along with ru... |

12 | D.: Fast algorithms for large state space HMMs with applications to web usage analysis - Felzenszwalb, Huttenlocher |

1 | Towards removing artificial landmarks for autonomous exploration in structured environments
- Sharma, Morales, et al.
- 2005
(Show Context)
Citation Context ... as speech recognition, handwriting recognition, financial modeling, bioinformatics and even domains where realvalued hidden-variable models are currently the norm, such as mobile robot localization (=-=Sharma et al., 2005-=-). There are several possibilities for building on this work, both in HMMs as well as in general Bayesian Networks. As massive data streams become increasingly common, one emerging goal is to be able ... |