Results 1  10
of
70
Learning in graphical models
 STATISTICAL SCIENCE
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 655 (10 self)
 Add to MetaCart
(Show Context)
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 598 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Factored language models and generalized parallel backoff
 in Proceedings of HLT/NACCL, 2003
"... We introduce factored language models (FLMs) and generalized parallel backoff (GPB). An FLM represents words as bundles of features (e.g., morphological classes, stems, datadriven clusters, etc.), and induces a probability model covering sequences of bundles rather than just words. GPB extends sta ..."
Abstract

Cited by 94 (13 self)
 Add to MetaCart
We introduce factored language models (FLMs) and generalized parallel backoff (GPB). An FLM represents words as bundles of features (e.g., morphological classes, stems, datadriven clusters, etc.), and induces a probability model covering sequences of bundles rather than just words. GPB extends standard backoff to general conditional probability tables where variables might be heterogeneous types, where no obvious natural (temporal) backoff order exists, and where multiple dynamic backoff strategies are allowed. These methodologies were implemented during the JHU 2002 workshop as extensions to the SRI language modeling toolkit. This paper provides initial perplexity results on both CallHome Arabic and on Penn Treebank Wall Street Journal articles. Significantly, FLMs with GPB can produce bigrams with significantly lower perplexity, sometimes lower than highlyoptimized baseline trigrams. In a multipass speech recognition context, where bigrams are used to create firstpass bigram lattices or Nbest lists, these results are highly relevant. 1
A study of interspeaker variability in speaker verification
 IEEE Trans. Audio, Speech and Language Processing
, 2008
"... Abstract — We propose a new approach to the problem of estimating the hyperparameters which define the interspeaker variability model in joint factor analysis. We tested the proposed estimation technique on the NIST 2006 speaker recognition evaluation data and obtained 10–15 % reductions in error r ..."
Abstract

Cited by 74 (11 self)
 Add to MetaCart
(Show Context)
Abstract — We propose a new approach to the problem of estimating the hyperparameters which define the interspeaker variability model in joint factor analysis. We tested the proposed estimation technique on the NIST 2006 speaker recognition evaluation data and obtained 10–15 % reductions in error rates on the core condition and the extended data condition (as measured both by equal error rates and the NIST detection cost function). We show that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the crosschannel condition, it is capable of performing at least as well as fusions of multiple systems of other types. (The comparisons are based on the best results on these tasks that have been reported in the literature.) In the case of the crosschannel condition, a factor analysis model with 300 speaker factors and 200 channel factors can achieve equal error rates of less than 3.0%. This is a substantial improvement over the best results that have previously been reported on this task. Index Terms — Speaker verification, Gaussian mixture model, speaker factors, channel factors
LANDMARKBASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
"... ..."
Multimodal integration for meeting group action segmentation and recognition
 MLMI 2005, 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms
, 2006
"... Abstract. We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
(Show Context)
Abstract. We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of the meeting. A common lexicon of multimodal group meeting actions, a shared meeting data set, and a common evaluation procedure enable us to compare the different approaches. We compare three different multimodal feature sets and four modelling infrastructures: a higher semantic feature approach, multilayer HMMs, a multistream DBN, as well as a multistream mixedstate DBN for disturbed data. 1
Discriminative models for speech recognition
 In Information Theory and Applications Workshop
, 1997
"... Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training crit ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
(Show Context)
Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training criteria, such as maximum mutual information and minimum phone error. However, the underlying acoustic model is still generative, with the associated constraints on the state and transition probability distributions, and classification is based on Bayes ’ decision rule. Recently, there has been interest in examining discriminative, or direct, models for speech recognition. This paper briefly reviews the forms of discriminative models that have been investigated. These include maximum entropy Markov models, hidden conditional random fields and conditional augmented models. The relationships between the various models and issues with applying them to large vocabulary continuous speech recognition will be discussed. I.
A bidirectional targetfiltering model of speech coarticulation and reduction: Twostage implementation for phonetic recognition
 IEEE Trans. Audio, Speech, Lang. Process
, 2006
"... Abstract—A structured generative model of speech coarticulation and reduction is described with a novel twostage implementation. At the first stage, the dynamics of formants or vocal tract resonances (VTRs) in fluent speech is generated using prior information of resonance targets in the phone seq ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
Abstract—A structured generative model of speech coarticulation and reduction is described with a novel twostage implementation. At the first stage, the dynamics of formants or vocal tract resonances (VTRs) in fluent speech is generated using prior information of resonance targets in the phone sequence, in absence of acoustic data. Bidirectional temporal filtering with finiteimpulse response (FIR) is applied to the segmental target sequence as the FIR filter’s input, where forward filtering produces anticipatory coarticulation and backward filtering produces regressive coarticulation. The filtering process is shown also to result in realistic resonancefrequency undershooting or reduction for fastrate and loweffort speech in a contextually assimilated manner. At the second stage, the dynamics of speech cepstra are predicted analytically based on the FIRfiltered and speakeradapted VTR targets, and the prediction residuals are modeled by Gaussian random variables with trainable parameters. The combined system of these two stages, thus, generates correlated and causally related VTR and cepstral dynamics, where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. We present details of model simulation demonstrating quantitative effects of speaking rate and segment duration on the magnitude of reduction, agreeing closely with experimental measurement results in the acousticphonetic literature. This twostage model is implemented and applied to the TIMIT phonetic recognition task. Using thebest ( = 2000) rescoring paradigm, the new model, which contains only contextindependent parameters, is shown to significantly reduce the phone error rate of a standard hidden Markov model (HMM) system under the same experimental conditions. Index Terms—Cepstral dynamics, contextual assimilation, filtering of targets, formant dynamics, longspan context dependence, phonetic recognition, phonetic reduction, resonances, TIMIT. I.