Results 1 - 10
of
18
A Probabilistic Framework For Segment-Based Speech Recognition
, 2003
"... Most current speech recognizers use an observatE9 space based on atS8VV al sequence of measur extn ct from fixed-lengt "frames" (e.g., Mel-cepst-ce Given ahypot9; ical word or sub-word sequence, te acoustO likelihood computp;VW always involves allobservat ion frames,t,;LI t, mapping beting individ ..."
Abstract
-
Cited by 108 (33 self)
- Add to MetaCart
Most current speech recognizers use an observatE9 space based on atS8VV al sequence of measur extn ct from fixed-lengt "frames" (e.g., Mel-cepst-ce Given ahypot9; ical word or sub-word sequence, te acoustO likelihood computp;VW always involves allobservat ion frames,t,;LI t, mapping beting individual frames andintV nal recognizerstr;E will depend on t;hypotEO; zed segmentme;LH There is anotLO tot of recognizer whoseobservat ion space isbetI r represente as anet ork, or graph, where each arc in t; graph correspondst a hypotL;) zed variable-lengt segment tm is represente by a fixed-dimensional "featO e". In suchfeatSE;)E sed recognizers, eachhypotO99 zed segmentme;L will correspondt a segment sequence, orpatH ttHSV tt overall segme ntme aph th; is associato wit a subset of all possible feat revectI s intV tVLI observatEV space. Int;E work we examine a maximum apostW iori decoding stcodin forfeat ure-based recognizers and develop a normalizat ioncrit9S on useful for a segme ntme; ed VitOLO or A # search. Experiment arereport ed for bot phoneto and word recognitco tcog .
Heterogeneous Acoustic Measurement And Multiple Classifiers For Speech Recognition
, 1998
"... The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a contrasting approach using more detailed and more diverse acoustic measurements, which we refer to as heterogeneous measurements.
Recent Improvements In An Approach To Segment-Based Automatic Language Identification
- In Proceedings of the 1994 International Conference on Spoken Language Processing
, 1994
"... In 1993, a segment-based system for Automatic Language Identification (ALI) was developed and introduced. The system incorporates phonetic, acoustic, and prosodic information within a probabilistic framework. The original system was trained and tested using the OGI MultiLanguage Telephone Speech Cor ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
In 1993, a segment-based system for Automatic Language Identification (ALI) was developed and introduced. The system incorporates phonetic, acoustic, and prosodic information within a probabilistic framework. The original system was trained and tested using the OGI MultiLanguage Telephone Speech Corpus and achieved an accuracy of 57.3% in identifying the language of test utterances from the OGI corpus. Recent improvements to the system have included the addition of channel normalization during preprocessing, the utilization of the recently transcribed utterances from the OGI corpus for phonetic recognition training, the use of mixture Gaussian density functions for the modeling of prosodic information, and the development of a hill-climbing optimization procedure for determining the scaling factors used when combining the scores from different models. The current system has achieved an accuracy of 79.7% in identifying the language of test utterances. INTRODUCTION Recently, research ac...
Near-Miss Modeling: A Segment-Based Approach to Speech Recognition
, 1998
"... Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors and facilitate the incorporation of a wide range of modeling strategies. However, difficulties in segmentbased recognition have impeded the realization of potential advantages in modeling. This thesis
Phonological Parsing for Bi-directional Letterto-Sound/Sound-to-Letter Generation
- Journal of Speech Communication
, 1995
"... In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information suc ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as mor-phology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and test-ing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6 % and 5 % respectively. Of the remaining words our system achieved a word accuracy of 71.8~0 and a phoneme accuracy of 92.5 % for letter-to-sound generation, and a word accuracy of 55.8 % and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierar-chical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious de-scription for English orthographic-phonological regularities, while simultaneously attaining competitive generation accu-racy.
Automatic Language Identification Using a Segment-Based Approach
- Proc. Eurospeech
, 1993
"... Automatic Language Identification (ALI) is the problem of automatically identifying the language of an utterance through the use of a computer. In 1977, House and Neuburg proposed an approach to ALI which focused on the phonotactic constraints of different languages. Their work suggested that simple ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Automatic Language Identification (ALI) is the problem of automatically identifying the language of an utterance through the use of a computer. In 1977, House and Neuburg proposed an approach to ALI which focused on the phonotactic constraints of different languages. Their work suggested that simple language models could be used effectively for language identification if an accurate phonetic representation of an utterance could be obtained from the acoustic signal. Our research utilizes House and Neuburg's ideas as the starting point for a new segment-based approach to ALI. To develop a solid theoretical basis for the design of an ALI system, a formal probabilistic framework has been developed. This framework uses House and Neuburg's ideas as its foundation but also utilizes additional information that may be useful for ALI. Specifically, phonotactic, acoustic and prosodic information are all incorporated into the framework which provides the structure for the segment-based system. To ...
Segment-Based Automatic Language Identification
, 1997
"... This paper discusses the formulation, development and analysis of a segment-based approach to the Automatic Language Identification (LID) problem. This system utilizes phonotactic, acoustic-phonetic and prosodic information within a unified probabilistic framework. The implementation of this framewo ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper discusses the formulation, development and analysis of a segment-based approach to the Automatic Language Identification (LID) problem. This system utilizes phonotactic, acoustic-phonetic and prosodic information within a unified probabilistic framework. The implementation of this framework allows the relative contributions of different sources of information to be determined empirically, as well as providing the mechanism for combining them within one system. The system has been evaluated using the OGI Multi-Language Telephone Speech Corpus and the results are competetive with other current LID systems. The results have also indicated that, while the phontotactic information of a spoken utterace is the most useful information for LID, acoustic-phonetic and prosodic information can be useful for increasing a system's accuracy, especially when the utterance is short.
Automatically Generated Word Pronunciations From Phoneme Classifier Output
- In Proceedings of the ICASSP 1993,volume 2
, 1993
"... We describe an automatic procedure for modeling alternate pronunciations of words produced by different talkers. The research compared recognition performance on forty city and state names using three different representations of each word. In the first case, the expected pronunciation (s) of each w ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We describe an automatic procedure for modeling alternate pronunciations of words produced by different talkers. The research compared recognition performance on forty city and state names using three different representations of each word. In the first case, the expected pronunciation (s) of each word was produced by an expert. In the second case, a dynamic programming algorithm was used to create a pronunciation network for each word by combining phonetic transcriptions from ten utterances of the word produced by human labelers. The third case was identical to the second, except that the phonetic labels were provided automatically by a phonetic recognition algorithm. On a test set of words produced by new speakers, equivalent recognition performance was observed for the pronunciation networks derived from human and machine labels, and both produced superior performance to that obtained with the pronunciations produced by the expert. 1. INTRODUCTION There is considerable variation in...
Providing Sublexical Constraints For Word Spotting Within The Angie Framework
- In Proc. Eurospeech '97
"... We describe our recent work in implementing a word-spotting system based on the ANGIE framework and the effects of varying the nature of the sublexical constraints placed upon the wordspotter 's filler model. ANGIE is a framework for modelling speech where the morphological and phonological substruc ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe our recent work in implementing a word-spotting system based on the ANGIE framework and the effects of varying the nature of the sublexical constraints placed upon the wordspotter 's filler model. ANGIE is a framework for modelling speech where the morphological and phonological substructures of words are jointly characterized by a context-free grammar and are represented in a multi-layered hierarchical structure. In this representation, the upper layers capture syllabification, morphology, and stress, the preterminal layer represents phonemics, and the bottom terminal categories are the phones. ANGIE provides a flexible framework where we can explore the effects of sublexical constraints within a word-spotting environment. Our experiments with spotting city names in ATIS validate the intuition that increasing the constraints present in the model improves performance, from 85.3 FOM for phone bigram to 89.3 FOM for a word lexicon. They also empirically strengthens our belief...
The Use of Speaker Correlation Information for Automatic Speech Recognition
, 1998
"... This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker in ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker independent systems, as can seen by the severe drop in performance exhibited by systems between their speaker dependent mode and their speaker independent mode. The typical solution to this problem is to apply speaker adaptation to the models of the speaker independent system. This approach is examined in this thesis with the explicit goal of improving the rapid adaptation capabilities of the system by incorporating within-speaker correlation information into the adaptation process. This is achieved through the creation of an adaptation technique called referencespeaker weighting and in the development of a speaker clustering technique called speaker cluster weighting. However, speaker adaptation is just one way in which the independence assumption can be attacked. This dissertation also introduces a novel speech recognition technique called consistency modeling. This technique utilizes a priori knowledge about the within-speaker correlations which exist between di#erent phonetic events for the purpose of incorporating speaker constraintinto a speech recognition system without explicitly applying speaker adaptation. These new techniques are implemented within a segment-based speech recognition system and evaluation results are reported on the DARPA Resource Management recognition task.

