Results 11  20
of
21
Interactive Feature Induction And Logistic Regression For Whole Sentence Exponential Language Models
 IN PROCEEDINGS OF THE IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING
, 1999
"... Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on fea ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Whole sentence exponential language models directly model the probability of an entire sentence using arbitrary computable properties of that sentence. We present an interactive methodology for feature induction, and demonstrate it in the simple but common case of a trigram baseline, focusing on features that capture the linguistic notion of semantic coherence. We then show how parametric regression can be used in this setup to efficiently estimate the model's parameters, whereas nonparametric regression can be used to construct more powerful exponential models from the raw features.
Minimum Classification Error Training In Exponential Language Models
, 2000
"... Minimum Classification Error (MCE) training is difficult to apply to language modeling due to inherent scarcity of training data (Nbest lists). However, a wholesentence exponential language model is particularly suitable for MCE training, because it can use a relatively small number of powerful fe ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Minimum Classification Error (MCE) training is difficult to apply to language modeling due to inherent scarcity of training data (Nbest lists). However, a wholesentence exponential language model is particularly suitable for MCE training, because it can use a relatively small number of powerful features to capture global sentential phenomena. We review the model, discuss feature induction, find features in both the Broadcast News and Switchboard domains, and build an MCEtrained model for the latter. Our experiments show that even models with relatively few features are prone to overfitting and are sensitive to initial parameter setting, leading us to examine alternative weight optimization criteria and search algorithms.
Just how good is maximum entropy? An empirical investigation using ensembles of MEMD models for attributevalue grammars
"... Maximum entropy has been theoretically argued as being the principled way to estimate models that are only partially determined by some set of empirically observed constraints. However, such arguments hinge upon large sample behaviour, and it is unclear how well maximum entropy performs when this as ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Maximum entropy has been theoretically argued as being the principled way to estimate models that are only partially determined by some set of empirically observed constraints. However, such arguments hinge upon large sample behaviour, and it is unclear how well maximum entropy performs when this assumption is violated by small samples. Within the maximum entropy / minimum divergence (MEMD) framework, and when operating in the domain of parse selection, we estimate lower and upper bounds on the performance of such models. Maximum entropy, even when samples are small, is shown to produce models near the upper bound. In addition to prediction using single models, we also investigate how well maximum entropy compares with ensembles of MEMD models. Maximum entropy is found to be competitive with such ensembles. Since ensemble learning requires substantially more computational resources than single model learning, yet delivers similar results to maximum entropy, this is a useful finding.
Y.: A Web Recommendation System Based on Maximum Entropy
 In: Proc. IEEE International Conference on Information Technology Coding and Computing, Las Vegas
, 2005
"... We propose a Web recommendation system based on a maximum entropy model. Under the maximum entropy principle, we can combine multiple levels of knowledge about users ’ navigational behavior in order to automatically generate the most effective recommendations for new users with similar profiles. The ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We propose a Web recommendation system based on a maximum entropy model. Under the maximum entropy principle, we can combine multiple levels of knowledge about users ’ navigational behavior in order to automatically generate the most effective recommendations for new users with similar profiles. The knowledge include the pagelevel statistics about users’ historically visited pages, and the aggregate usage patterns discovered through Web usage mining. In particular, we use a Web mining framework based on Probabilistic Latent Semantic Analysis to discover the underlying interests of Web users as well as temporal changes in these interests. Our experiments show that our recommendation system can achieve better accuracy when compared to standard approaches, while providing a better interpretation of Web users ’ diverse navigational behavior. 1
Using Perfect Sampling in Parameter Estimation of a Whole Sentence Maximum Entropy Language Model
, 2000
"... The Maximum Entropy principle (ME) is an ap propriate framework for combining information of a diverse nature from several sources into the same language model. In order to incorporate longdistance information into the ME framework in a language model, a Whole Sentence Maximum Entropy Language Mod ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The Maximum Entropy principle (ME) is an ap propriate framework for combining information of a diverse nature from several sources into the same language model. In order to incorporate longdistance information into the ME framework in a language model, a Whole Sentence Maximum Entropy Language Model (WSME) could be used. Until now MonteCarlo Markov Chains (MCMC) sampling techniques has been used to estimate the paramenters of the WSME model. In this paper, we propose the application of another sampling technique: the Perfect Sampling (PS). The experiment has shown a reduction of 30% in the perplexity of the WSME model over the trigram model and a reduc tion of 2% over the WSME model trained with MCMC.
Improvement of a Whole Sentence Maximum Entropy Language Model Using Grammatical Features
, 2001
"... In this paper, we propose adding longterm grammatical information in a Whole Sentence Maximun Entropy Language Model (WSME) in order to improve the performance of the model. The grammatical information was added to the WSME model as features and were obtained from a Stochastic ContextFree ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we propose adding longterm grammatical information in a Whole Sentence Maximun Entropy Language Model (WSME) in order to improve the performance of the model. The grammatical information was added to the WSME model as features and were obtained from a Stochastic ContextFree grammar. Finally, experiments using a part of the Penn Treebank corpus were carried out and significant improvements were acheived.
Graph Model Selection using Maximum Likelihood keyword1, keyword2, keyword3, keyword4
"... keyword1, keyword2, keyword3, keyword4 In recent years, there has been a proliferation of theoretical graph models, e.g., preferential attachment and smallworld models, motivated by realworld graphs such as the Internet topology. To address the natural question of which model is best for a particu ..."
Abstract
 Add to MetaCart
(Show Context)
keyword1, keyword2, keyword3, keyword4 In recent years, there has been a proliferation of theoretical graph models, e.g., preferential attachment and smallworld models, motivated by realworld graphs such as the Internet topology. To address the natural question of which model is best for a particular data set, we propose a model selection criterion for graph models. Since each model is in fact a probability distribution over graphs, we suggest using Maximum Likelihood to compare graph models and select their parameters. Interestingly, for the case of graph models, computing likelihoods is a difficult algorithmic task. However, we design and implement MCMC algorithms for computing the maximum likelihood for four popular models: a powerlaw random graph model, a preferential attachment model, a smallworld model, and a uniform random graph model. We hope that this novel use of ML will objectify comparisons between graph models. 1.
SPECIAL ISSUE ON FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION 1 Structured Discriminative Models For Speech Recognition
"... classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and Ngra ..."
Abstract
 Add to MetaCart
(Show Context)
classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and Ngram language models using Bayes ’ rule, have become the dominant technology used in ASR. Conversely, the machine learning and natural language processing (NLP) research areas are increasingly dominated by discriminative approaches, where the class posteriors are directly modelled. This paper describes recent work in the area of structured discriminative models for ASR. To handle continuous, variable length, observation sequences, the approaches applied to NLP tasks must be modified. This paper discusses a variety of approaches for applying structured discriminative models to ASR, both from the current literature and possible future approaches. We concentrate on structured models themselves, the descriptive features of observations commonly used within the models, and various options for optimizing the parameters of the models. I.
Hidden Markov Model A Dynamic Bayesian Network
, 2012
"... – generative models and speech production – discriminative models and features ..."
Abstract
 Add to MetaCart
(Show Context)
– generative models and speech production – discriminative models and features