Results 1 -
8 of
8
Unlimited vocabulary speech recognition for agglutinative languages
- In Proc. HLT-NAACL
, 2006
"... It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflecti ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflections this leads to millions of different, but still frequent word forms. Due to inflections, ambiguity and other phenomena, it is also not trivial to automatically split the words into meaningful parts. Rule-based morphological analyzers can perform this splitting, but due to the handcrafted rules, they also suffer from an out-of-vocabulary problem. In this paper we apply a recently proposed fully automatic and rather language and vocabulary independent way to build subword lexica for three different agglutinative languages. We demonstrate the language portability as well by building a successful large vocabulary speech recognizer for each language and show superior recognition performance compared to the corresponding word-based reference systems. 1
Unsupervsied segmentation of words into morphemes – Challenge 2005: An introduction and evaluation report
- In Proc. of 2nd Pascal Challenges Workshop
, 2006
"... ..."
Unsupervised segmentation of words into morphemes – Morpho Challenge 2005: Application to automatic speech recognition
- in Proc. ICSLP
, 2006
"... Within the EU Network of Excellence PASCAL, a challenge was organized to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Within the EU Network of Excellence PASCAL, a challenge was organized to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling. Twelve research groups participated in the challenge and had submitted segmentation results obtained by their algorithms. In this paper, we evaluate the application of these segmentation algorithms to large vocabulary speech recognition using statistical n-gram language models based on the proposed word segments instead of entire words. Experiments were done for two agglutinative and morphologically rich languages: Finnish and Turkish. We also investigate combining various segmentations to improve the performance of the recognizer. Index Terms: speech recognition, language modelling, morphemes, unsupervised learning.
Higher Order Statistics in Play-out Analysis
"... Playing out the game from the current state to the end many times randomly, provides statistics that can be used for selecting the best move. This play-out analysis has proved to work well in games such as Backgammon, Bridge, and Go. This paper introduces a method that selects relevant patterns of m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Playing out the game from the current state to the end many times randomly, provides statistics that can be used for selecting the best move. This play-out analysis has proved to work well in games such as Backgammon, Bridge, and Go. This paper introduces a method that selects relevant patterns of moves to collect higher order statistics. This can be used to improve the quality of the play outs. Play-out analysis avoids the horizon effect of regular game-tree search. The proposed method should be especially effective when the game can be decomposed into a number of subgames. Game of Y is a two-player board game played on a graph with a task of connecting three edges of the graph together. Preliminary experiments on Y did not yet show significant improvement over the first-order approach, but a door has been opened for further improvement. The game of Y might prove to be a good testbed for machine learning. 1
Spoken commands in a Smart Home: An iterative approach to the Sphinx Algorithm
"... Abstract. An algorithm for decoding commands spoken in an intelligent environment through iterative vocabulary reduction is presented. Current research in the field of speech recognition focuses primarily on the optimization of algorithms for single pass decoding using large vocabularies. While this ..."
Abstract
- Add to MetaCart
Abstract. An algorithm for decoding commands spoken in an intelligent environment through iterative vocabulary reduction is presented. Current research in the field of speech recognition focuses primarily on the optimization of algorithms for single pass decoding using large vocabularies. While this is ideal for processing conversational speech, alternative methods should be explored for different domains of speech, specifically commands issued verbally in an intelligent environment. Such commands have both an explicitly defined structure and a vocabulary limited to valid task descriptions. We propose that a multiple pass context-driven decoding scheme utilizing dictionary pruning yields improved accuracy; this occurs when one deals with command structure and a reduced vocabulary. Each iteration incorporates the hypothesis of the previous into its decoding scheme by removing unlikely words from the current language model. We have applied this decoding method to a comprehensive set of spoken commands through the use of Sphinx-4, an Automatic Speech Recognition (ASR) engine using the Hidden Markov Model (HMM). When decoding via HMM, multiple previous states are used to determine the current state, thus utilizing context to aid in intelligent recognition. Our results show that within a fixed domain, multiple pass decoding yields recognition accuracy. Further research must be conducted to optimize practical context driven decoding and to apply the method to larger domains, primarily those of intelligent environments.
Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition
"... Speech recognition in many morphologically rich languages suffers from a very high out-of-vocabulary (OOV) ratio. Earlier work has shown that vocabulary decomposition methods can practically solve this problem for a subset of these languages. This paper compares various vocabulary decomposition appr ..."
Abstract
- Add to MetaCart
Speech recognition in many morphologically rich languages suffers from a very high out-of-vocabulary (OOV) ratio. Earlier work has shown that vocabulary decomposition methods can practically solve this problem for a subset of these languages. This paper compares various vocabulary decomposition approaches to open vocabulary speech recognition, using Estonian speech recognition as a benchmark. Comparisons are performed utilizing large models of 60000 lexical items and smaller vocabularies of 5000 items. A large vocabulary model based on a manually constructed morphological tagger is shown to give the lowest word error rate, while the unsupervised morphology discovery method Morfessor Baseline gives marginally weaker results. Only the Morfessor-based approach is shown to adequately scale to smaller vocabulary sizes. 1
ROBUST AUTOMATIC SPEECH RECOGNITION USING ACOUSTIC MODEL ADAPTATION PRIOR TO MISSING FEATURE RECONSTRUCTION
"... When speech recognition is used in real-world environments, simultaneous speaker and environmental adaptation and compensation for time-varying noise effects is needed. Noise compensation methods like missing feature reconstruction should be combined with adaptation methods like constrained maximum ..."
Abstract
- Add to MetaCart
When speech recognition is used in real-world environments, simultaneous speaker and environmental adaptation and compensation for time-varying noise effects is needed. Noise compensation methods like missing feature reconstruction should be combined with adaptation methods like constrained maximum likelihood linear regression (CMLLR). This is only straightforward if reconstruction is used prior to CMLLR. In this work, reconstruction is modified so that we can estimate CMLLR transformations prior to reconstruction. The new approach is evaluated on large vocabulary speech data recorded in noisy public and car environments and compared to using reconstruction prior to CMLLR estimation. The results suggest the noise environment determines which approach is better. Using adaptation prior to reconstruction has the better performance when evaluated on data from public environments. The relative reductions in letter error rate were 47–50 % compared to the baseline and 13–19 % compared to using either adaptation or reconstruction alone. 1.
Speaker-Based Segmentation and Adaptation in Automatic Speech Recognition
, 2007
"... in the projects New adaptive and learning methods in speech recognition and New methods and applications for speech technology. I thank professor Erkki Oja for supervising the thesis. I thank my instructor docent Mikko Kurimo for the opportunity to work in the speech group and for the valuable advic ..."
Abstract
- Add to MetaCart
in the projects New adaptive and learning methods in speech recognition and New methods and applications for speech technology. I thank professor Erkki Oja for supervising the thesis. I thank my instructor docent Mikko Kurimo for the opportunity to work in the speech group and for the valuable advice he has given. This work would not have been possible without the prior work done in the speech group, and thus, I have the current and former speech group members to thank. I would like to take this opportunity to especially thank Janne Pylkkönen, Teemu Hirsimäki and Vesa Siivola who have helped me with all those various problems that I have encountered during my time in the laboratory. Also, Kalle Palomäki is to thank for the time he has taken to read this thesis and for his comments that helped to improve the work. I thank Tommi, and I thank my friends who shared a cup of coffee with me when I needed their kind words for encouragement.

