Results 1 -
6 of
6
Speaker Independent Continuous Speech Recognition Using An Acoustic-Phonetic Italian Corpus
- in Proc. of ICSLP
, 1994
"... The objective of this paper is to describe the activity that is being carried out at IRST laboratories for the development of an HMM-based speaker independent continuous speech recognition system for the Italian language. The recognition system is trained and tested using the acoustic-phonetic conti ..."
Abstract
-
Cited by 40 (28 self)
- Add to MetaCart
The objective of this paper is to describe the activity that is being carried out at IRST laboratories for the development of an HMM-based speaker independent continuous speech recognition system for the Italian language. The recognition system is trained and tested using the acoustic-phonetic continuous speech portion of the APASCI corpus. Acoustic modeling is based on the use of Continuous Density HMMs with gaussian mixture observation densities. As a baseline, a set of 38 Context Independent Units was evaluated using different numbers of mixture components. Then, two other classes of Context Dependent Unit sets were considered, that provide different performance and system complexity. Performance, expressed in terms of Phone loop recognition accuracy and Word loop recognition accuracy, shows an improvement using both of these classes of unit sets, with respect to the baseline. I. INTRODUCTION A baseline of a speaker independent continuous speech recognition system for the Italian ...
Automatic classification of prosodically marked phrase boundaries in German
, 1993
"... A large corpus has been created automatically and read by 100 speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
A large corpus has been created automatically and read by 100 speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ones perceived by listeners. Gaussian distribution and polynomial classifiers were trained on a set of prosodic features computed from the speech signal using the automatically generated boundary markers. Comparing the classification results with the judgments of the listeners yielded in a recognition rate of 87%. A combination with stochastic language models improved the recognition rate to 90%. We found that the pause and the durational features are most important for the classification, but that the influence of F0 is not neglectable.
Performance ‘General Purpose’ Phonetic Recognition for Italian
- In Proceedings ICSLP-2000, International Conference on Spoken Language Processing
, 2000
"... The development of a speaker independent “general purpose” phonetic recognizer for Italian is described. The CSLU Toolkit was used to develop and implement the system. The recognizer, based on a frame-based hybrid HMM/ANN architecture trained on context-dependent categories to account for coarticula ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
The development of a speaker independent “general purpose” phonetic recognizer for Italian is described. The CSLU Toolkit was used to develop and implement the system. The recognizer, based on a frame-based hybrid HMM/ANN architecture trained on context-dependent categories to account for coarticulatory variation, recognizes 38 different phonemes (not including silence or closures), and can distinguish between stressed and unstressed vowels as well as open and closed vowels. The APASCI corpus, containing nearly 2500 sentences read by 100 speakers, where the sentences have been designed to maximize the number of phonemes occurring in different contexts, was used for training and testing. As of the time of this writing, a phoneme-level accuracy of 82.90 % on the development set and of 80.53 % on the test set has been obtained. This level of accuracy is much greater than on a similar English-language corpus (with state-of-the-art performance of slightly better than 70%) and it represents the best performance obtained so far on this corpus. 1.
Automatic Segmentation of Speech at the Phonetic Level
- in Structural, Syntactic, and Statistical Pattern Recognition
"... Abstract. A complete automatic speech segmentation technique has been studied in order to eliminate the need for manually segmented sentences. The goal is to fix the phoneme boundaries using only the speech waveform and the phonetic sequence of the sentences. The phonetic boundaries are established ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. A complete automatic speech segmentation technique has been studied in order to eliminate the need for manually segmented sentences. The goal is to fix the phoneme boundaries using only the speech waveform and the phonetic sequence of the sentences. The phonetic boundaries are established using a Dynamic Time Warping algorithm that uses the a posteriori probabilities of each phonetic unit given the acoustic frame. These a posteriori probabilities are calculated by combining the probabilities of acoustic classes which are obtained from a clustering procedure on the feature space and the conditional probabilities of each acoustic class with respect to each phonetic unit. The usefulness of the approach presented here is that manually segmented data is not needed in order to train acoustic models. The results of the obtained segmentation are similar to those obtained using the HTK toolkit with the “flat-start ” option activated. Finally, results using Artificial Neural Networks and manually segmented data are also reported for comparison purposes. 1
Automatic Classification Of Prosodically Marked Phrase Boundaries In German
, 1994
"... A large corpus has been created automatically and read by 100 speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ..."
Abstract
- Add to MetaCart
A large corpus has been created automatically and read by 100 speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ones perceived by listeners. Gaussian distribution and polynomial classifiers were trained on a set of prosodic features computed from the speech signal using the automatically generated boundary markers. Comparing the classification results with the judgments of the listeners yielded in a recognition rate of 87%. A combination with stochastic language models improved the recognition rate to 90%. We found that the pause and the durational features are most important for the classification, but that the influence of F0 is not neglectable. 1. INTRODUCTION A successful automatic detection of phrase boundaries can be used to rescore the n-best sentence hypotheses computed by a word recognizer [...
Towards Automatic Word Segmentation of Dialect Speech
"... This paper is about the creation of a digital dialect database, and the focus is on automatic word segmentation. Automatic word segmentation has been studied by several research groups during the last two decades. However, the task we are faced with differs in several respects from previous ones. Fo ..."
Abstract
- Add to MetaCart
This paper is about the creation of a digital dialect database, and the focus is on automatic word segmentation. Automatic word segmentation has been studied by several research groups during the last two decades. However, the task we are faced with differs in several respects from previous ones. For instance, in our case we are dealing with recordings of interviews containing spontaneous dialect speech and `enriched' (quasi-phonetic) orthographic transcriptions (instead of `normal' orthographic transcriptions, which are usually available). Furthermore, the nature of the task requires that the word segmentation procedure can be adapted for each interview.

