Results 11  20
of
201
Graphical models and automatic speech recognition
 Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract

Cited by 72 (13 self)
 Add to MetaCart
(Show Context)
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic, pronunciation, and languagemodeling levels. A number of speech recognition techniques born directly out of the graphicalmodels paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov modelbased speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
Efficient Training of Conditional Random Fields
, 2002
"... This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced probabilistic model for labelling and segmenting sequential data. Theoretical and practical disadvantages of the training techniques reported in current literature on CRFs are discus ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced probabilistic model for labelling and segmenting sequential data. Theoretical and practical disadvantages of the training techniques reported in current literature on CRFs are discussed. We hypothesise that general numerical optimisation techniques result in improved performance over iterative scaling algorithms for training CRFs. Experiments run on a a subset of a wellknown text chunking data set confirm that this is indeed the case. This is a highly promising result, indicating that such parameter estimation techniques make CRFs a practical and efficient choice for labelling sequential data, as well as a theoretically sound and principled probabilistic framework.
Exponential Priors for Maximum Entropy Models
 In Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2003
"... this paper. Finally, thanks to Stan Chen and Roni Rosenfeld: our derivation for Exponential priors closely follows the text of their derivation for Gaussian priors. ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
this paper. Finally, thanks to Stan Chen and Roni Rosenfeld: our derivation for Exponential priors closely follows the text of their derivation for Gaussian priors.
Algorithms For Bigram And Trigram Word Clustering
 Speech Communication
, 1995
"... . This paper presents and analyzes improved algorithms for clustering bigram and trigram word equivalence classes, and their respective results: 1) We give a detailed time complexity analysis of bigram clustering algorithms. 2) We present an improved implementation of bigram clustering so that large ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
(Show Context)
. This paper presents and analyzes improved algorithms for clustering bigram and trigram word equivalence classes, and their respective results: 1) We give a detailed time complexity analysis of bigram clustering algorithms. 2) We present an improved implementation of bigram clustering so that large corpora (38 million words and more) can be clustered within a small number of days or even hours. 3) We extend the clustering approach from bigrams to trigrams. 4) We present experimental results on a 38 million word training corpus. 1. INTRODUCTION Word equivalence classes are a method for improving undertrained word Mgram language models [1], [2], [4]. Words are grouped into classes, and each word belongs to only one such class. Thus, if a word pair is not seen in training, it is quite likely that the corresponding class pair is seen. For bigram and trigram class models, we have the equations p(wn jwn\Gamma1 ) = p0(wn jG(wn)) (1) \Deltap 1(G(wn)jG(wn\Gamma1)) p(wn jwn\Gamma2 ; wn\Gam...
A bit of progress in language modeling — extended version
, 2001
"... 1.1 Overview Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
(Show Context)
1.1 Overview Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition,
Learning Implicit User Interest Hierarchy for Context in Personalization
 In Proc. of International Conference on Intelligent User Interface (IUI
, 2003
"... To provide a more robust context for personalization, we desire to extract a continuum of general (longterm) to specific (shortterm) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical c ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
To provide a more robust context for personalization, we desire to extract a continuum of general (longterm) to specific (shortterm) interests of a user. Our proposed approach is to learn a user interest hierarchy (UIH) from a set of web pages visited by a user. We devise a divisive hierarchical clustering (DHC) algorithm to group words (topics) into a hierarchy where more general interests are represented by a larger set of words. Each web page can then be assigned to nodes in the hierarchy for further processing in learning and predicting interests. This approach is analogous to building a subject taxonomy for a library catalog system and assigning books to the taxonomy. Our approach does not need user involvement and learns the UIH "implicitly." Furthermore, it allows the original objects, web pages, to be assigned to multiple topics (nodes in the hierarchy). In this paper, we focus on learning the UIH from a set of visited pages. We propose a few similarity functions and dynamic thresholdfunding methods, and evaluate the resulting hierarchies according to their meaningfulhess and shape.
Maximum Entropy Techniques for Exploiting Syntactic, Semantic and Collocational Dependencies in Language Modeling
"... A new statistical language model is presented which combines collocational dependencies with two important sources of longrange statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Subs ..."
Abstract

Cited by 56 (11 self)
 Add to MetaCart
A new statistical language model is presented which combines collocational dependencies with two important sources of longrange statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard Ngram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within Ngram range due to an intervening phrase or clause. It is also shown that these two methods ind...
Probabilistic Syntax
, 2002
"... istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no linguistic reason), (ii) Probabilistic models don't model grammaticality (neither Colorless green ideas sleep furiously nor Furiously sleep ideas green colorless have previously been uttered  and hence must be estimated to have probability zero, Chomsky wrongly assumes  but the former is grammatical while the latter is not, and (iii) Use of probabilities does not meet the goal of describing the mindinternal Ilanguage as opposed to the observedintheworld Elanguage. This chapter is not meant to be a detailed critique of Chomsky's arguments  Abney (1996) provides a survey and a rebuttal, and Pereira (2000) has further useful discussion  but some of these concerns are still importa
Errorresponsive feedback mechanisms for speech recognizers
, 1997
"... This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's stateoftheart recognizers are not designed to be situated naturally in an error feedback loop, they are illpositioned for inclusion in multi ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
(Show Context)
This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's stateoftheart recognizers are not designed to be situated naturally in an error feedback loop, they are illpositioned for inclusion in multimodal interfaces, multimedia databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement ofword error rate. The speech recognizer's functionality is extended to include con dence annotations, which are \metalevel " markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally de ned error conditions back to the recognizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally de ned error conditions.