Results 1 -
5 of
5
Pronunciation Modeling Using a Finite-State Transducer Representation
- in Proc. ISCA Tutorial and Research Workshop on Pronunciation Modeling and Lexicon Adaptation
, 2002
"... The MIT SUMMIT speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finitestate transducer (FST) representation whose transition wei ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
The MIT SUMMIT speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finitestate transducer (FST) representation whose transition weights can be probabilistically trained using a modified EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our JUPITER weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system reduces word error rates by between 4% and 8% over different test sets when compared against a system using no phonological rewrite rules.
The MIT finite-state transducer toolkit for speech and language processing
- in Proc. ICSLP
, 2004
"... We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present the MIT Finite-State Transducer Toolkit and briefly describe research that it has benefitted. The toolkit is a collection of command-line tools and associated C++ API for manipulating finite-state transducers (FSTs) and acceptors (FSAs) and has been designed to enable research through its flexibility, yet remain efficient enough to aid real-world computationally demanding applications such as automatic speech recognition. The toolkit supports the construction, combination, optimization, and training of weighted FSTs and FSAs, and as such is useful in many areas of human language technology. 1.
Modelling Phonological Rules through Linguistic Hierarchies
, 2002
"... This paper describes our research aimed at acquiring a generalized probability model for alternative phonetic realizations in conversational speech. The approach begins with the application of a set of ordered context-dependent phonological rules, applied to the baseforms in the recognizer's lexicon ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper describes our research aimed at acquiring a generalized probability model for alternative phonetic realizations in conversational speech. The approach begins with the application of a set of ordered context-dependent phonological rules, applied to the baseforms in the recognizer's lexicon. The probability model is acquired by observing specific realizations expressed in a large training corpus. A set of context-free rules represents words in terms of a substructure that can then generalize context-dependent probabilities to other words that share the same sub-word context. The model is designed to capture phonetic predictions based on local phonemic, morphologic, and syllabic contexts, thus permitting training on corpora whose lexicon is divergent from that of the intended application. The training corpus consisted of a large set of Jupiter weather-domain speech data [9] augmented with a much smaller set of Mercury flight-domain data [20]. The baseline system utilized the same set of phonological rules for lexical expansion, but with no probability modelling for alternate pronunciations. We evaluated on a test set of utterances exclusively from the flight domain. Using this approach, we achieved a 12.6% reduction in speech understanding error rate on the test set.
Towards Formal Structural Representation of Spoken Language: An Evolving Transformation System (ETS) Approach
, 2005
"... Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the ad-vent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vector-based, nature of rep-resentation, the classes inductively discovered from real data using decision-theoretic techniques have little meaning outside the statistical framework. This is because deci-sion surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and lin-guistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisa-
Corpus-based unit selection for natural-sounding speech synthesis
, 2003
"... Speech synthesis is an automatic encoding process carried out by machine through which symbols conveying linguistic information are converted into an acoustic waveform. In the past decade or so, a recent trend toward a non-parametric, corpus-based approach has focused on using real human speech as s ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speech synthesis is an automatic encoding process carried out by machine through which symbols conveying linguistic information are converted into an acoustic waveform. In the past decade or so, a recent trend toward a non-parametric, corpus-based approach has focused on using real human speech as source material for producing novel natural-sounding speech. This work proposes a communication-theoretic formulation in which unit selection is a noisy channel through which an input sequence of symbols passes and an output sequence, possibly corrupted due to the coverage limits of the corpus, emerges. The penalty of approximation is quantified by substitution and concatenation costs which grade what unit contexts are interchangeable and where concatenations are not perceivable. These costs are semi-automatically derived from data and are found to agree with acoustic-phonetic knowledge.

