Results 1 -
3 of
3
A comparative study on language model adaptation techniques using new evaluation metrics
- Proc. HLT/EMNLP
, 2005
"... This paper presents comparative experimental results on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training methods, the boosting algorithm, the average perceptron and the minimum sample risk method, on the task of Japanese Ka ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper presents comparative experimental results on four techniques of language model adaptation, including a maximum a posteriori (MAP) method and three discriminative training methods, the boosting algorithm, the average perceptron and the minimum sample risk method, on the task of Japanese Kana-Kanji conversion. We evaluate these techniques beyond simply using the character error rate (CER): the CER results are interpreted using a metric of domain similarity between background and adaptation domains, and are further evaluated by correlating them with a novel metric for measuring the side effects of adapted models. Using these metrics, we show that the discriminative methods are superior to a MAP-based method not only in terms of achieving larger CER reduction, but also of being more robust against the similarity of background and adaptation domains, and achieve larger CER reduction with fewer side effects. 1
Language modeling structures in audio transcription for retrieval of historical speeches
- In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
, 2004
"... In this paper we apply speech recognition for automatic transcript generation for spoken document retrieval. The transcripts are used to compute an index for an archive of historical speeches and to provide the index, speech, and transcripts available for query based retrieval and browsing. In addit ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper we apply speech recognition for automatic transcript generation for spoken document retrieval. The transcripts are used to compute an index for an archive of historical speeches and to provide the index, speech, and transcripts available for query based retrieval and browsing. In addition to acoustic variability, the task is challenging, because it covers a broad spectrum of different speaking styles and use of language. Language modeling is important for speech recognition to determine the prior probabilities of the compared word and sentence candidates in decoding. Various large text corpora are available in electronic format for language model training, but the open question is what and how should we include to improve the audio transcripts of this task. In this work we compare large overall language models to focused ones trained on selected subsets of the data, and to combinations between both. With respect to the potential index terms, improvements were obtained for transcripts that did not fit well to the scope of the large overall language model. 1.
U N I V E R S
"... This project considers a number of the methods for instance/example selection in training data for language models with the most promising being experimented with and evaluated via hypothesis testing. The most successful, the expansion on the perplexity based work of Roger Moore was selected for fur ..."
Abstract
- Add to MetaCart
This project considers a number of the methods for instance/example selection in training data for language models with the most promising being experimented with and evaluated via hypothesis testing. The most successful, the expansion on the perplexity based work of Roger Moore was selected for further development due to its good test results and ability to locate related sentences. A number of possible filter methods were produced for improving the performance and results of that method. Each of these filters were tested with a decrease in data size of between 2.6 and 75 % being returned. The best performing of these filters with a decrease in data of 57 % was then selected and after some fine tuning a combination of it and the original method were tested to gauge its full abilities. The results show that the combination of methods managed to form a scalable solution to the problem with datasets with on average 48 % lower perplexity than a baseline approach being produced. The additional optimization features were shown to reduce the time to run by between 50 and 60%. i Acknowledgements Many thanks to my supervisor Miles Osbourne for his advice and guidance and to my colleges whose opinions helped me gain a full perspective on my work. Also to my proof readers for dealing with countless unnecessary commas. ii Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified.

