Results 1 - 10
of
61
Probabilistic independence networks for hidden Markov probability models
- Lifestyles() • Vendor() • AssortmentDefault() • Assortment(Assortment) • ProductDetailLegcareDefault() • ProductDetailLegcare(Product) • ProductDetailLegwearDefault() • ProductDetailLegwearProduct(Product) • ProductDetailLegwearAssortment(Assortment) • Pr
, 1997
"... Graphical techniques for modeling the dependencies of random variables have been explored in a variety of di erent areas including statistics, statistical physics, arti-cial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed ..."
Abstract
-
Cited by 155 (13 self)
- Add to MetaCart
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of di erent areas including statistics, statistical physics, arti-cial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach. 1
Functional Phonology -- Formalizing the interactions between articulatory and perceptual drives
, 1998
"... ..."
Speaking In Shorthand -- A Syllable-Centric Perspective For Understanding Pronunciation Variation
, 1998
"... Current-generation automatic speech recognition (ASR) systems model spoken discourse as a linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an a ..."
Abstract
-
Cited by 93 (12 self)
- Add to MetaCart
Current-generation automatic speech recognition (ASR) systems model spoken discourse as a linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an acoustic sequence with a lexical unit. Since there are, in practice, many different ways for a word to be pronounced, this standard approach adds a layer of complexity and ambiguity to the decoding process which, if modified, could potentially improve recognition performance. Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is systematic at the level of the syllable. Syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents. Prosodic stress also plays an important role in pronunciation. The governing mechanism is likely to involve the informationa...
Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production
- Psychological Review
, 1995
"... This article describes a neural network model of speech motor skill acquisition and speech production that explains a wide range of data on variability, motor equivalence, coarticulation, and rate effects. Model parameters are learned during a babbling phase. To explain how infants learn language-sp ..."
Abstract
-
Cited by 52 (21 self)
- Add to MetaCart
This article describes a neural network model of speech motor skill acquisition and speech production that explains a wide range of data on variability, motor equivalence, coarticulation, and rate effects. Model parameters are learned during a babbling phase. To explain how infants learn language-specific variability limits, speech sound targets take the form of convex regions, rather than points, in orosensory coordinates. Reducing target size for better accuracy during slower speech leads to differential effects for vowels and consonants, as seen in experiments previously used as evidence for separate control processes for the 2 sound types. Anticipatory coarticulation arises when targets are reduced in size on the basis of context; this generalizes the well-known look-ahead model of coarticulation. Computer simulations verify the model's properties. The primary goal of the modeling work described in this article is to provide a coherent theoretical framework that provides explanations for a wide range of data concerning the articulator movements used by humans to produce speech sounds. This is carried out by formulating a model that transforms strings of phonemes into continuous articulator movements for
Control of Spectral Dynamics in Concatenative Speech Synthesis
- IEEE Trans. Speech and Audio Processing
, 2001
"... Current speech synthesis methods based on the concatenation of waveform units can produce highly intelligible speech capturing the identity of a particular speaker. However, the quality of concatenated speech often suffers from discontinuities between the acoustic units, due to contextual difference ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
Current speech synthesis methods based on the concatenation of waveform units can produce highly intelligible speech capturing the identity of a particular speaker. However, the quality of concatenated speech often suffers from discontinuities between the acoustic units, due to contextual differences and variations in speaking style across the database. In this paper, we present methods to spectrally modify speech units in a concatenative synthesizer to correspond more closely to the acoustic transitions observed in natural speech. First, a technique called "unit fusion" is proposed to reduce spectral mismatch between units. In addition to concatenation units, a second, independent tier of units is selected that de nes the desired spectral dynamics at concatenation points. Both unit tiers are "fused" to obtain natural transitions throughout the synthesized utterance. The unit fusion method is further extended to control the perceived degree of articulation of concatenated units. In the...
The Elements of Functional Phonology
"... Phonological structures and processes are determined by the functional principles of minimization of articulatory effort and maximization of perceptual contrast. We can solve many hitherto controversial issues if we are aware of the different roles of articulation and perception in phonology. Trad ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
Phonological structures and processes are determined by the functional principles of minimization of articulatory effort and maximization of perceptual contrast. We can solve many hitherto controversial issues if we are aware of the different roles of articulation and perception in phonology. Traditionally separate devices like the segment, spreading, licensing, underspecification, feature geometry, and OCP effects, are surface phenomena created by the interaction of more fundamental principles.
Phonetically Driven Phonology: The Role of Optimality Theory and Inductive Grounding
- PROCEEDINGS OF THE 1996 MILWAUKEE CONFERENCE ON FORMALISM AND FUNCTIONALISM IN LINGUISTICS. [RUTGERS OPTIMALITY ARCHIVE 158] JUN, JONGHO
, 1997
"... Functionalist phonetic literature has shown how the phonologies of human languages are arranged to facilitate ease of articulation and perception. The explanatory force of phonological theory is greatly increased if it can directly access these research results. There are two formal mechanisms that ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Functionalist phonetic literature has shown how the phonologies of human languages are arranged to facilitate ease of articulation and perception. The explanatory force of phonological theory is greatly increased if it can directly access these research results. There are two formal mechanisms that together can facilitate the link-up of formal to functional work. As others have noted, Optimality Theory, with its emphasis on directly incorporating principles of markedness, can serve as part of the bridge. Another mechanism is proposed here: an algorithm for inductive grounding permits the language learner to access the knowledge gained from experience in articulation and perception, and form from it the appropriate set of formal phonological constraints.
Gradient Well-Formedness in Optimality Theory
, 1998
"... A minor modification in the framework of Optimality Theory (Prince and Smolensky 1993) is suggested which enables it to model phenomena where consultant intuitions are gradient, falling somewhere between complete well-formedness and complete ill-formedness. The proposal consists of assigning to cert ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
A minor modification in the framework of Optimality Theory (Prince and Smolensky 1993) is suggested which enables it to model phenomena where consultant intuitions are gradient, falling somewhere between complete well-formedness and complete ill-formedness. The proposal consists of assigning to certain constraints bands of values along a reified continuum of constraint strictness. When a particular form can be generated only by assigning a constraint a strictness value within a designated “fringe ” of the strictness band, the grammar generates the form marked with an intermediate degree of well-formedness. The proposal is tested against data involving light and dark /l / in American English, using a set of gradient intuitions obtained from ten native speaker consultants. A
Stochastic Suprasegmentals - Relationships between Redundancy, Prosodic Structure and Care of Articulation in Spontaneous Speech
, 2000
"... ..."
Predictability Effects on Durations of Content and Function Words in Conversational English
- JOURNAL OF MEMORY AND LANGUAGE
"... In a regression study of conversational speech, we show that frequency, contextual predictability and repetition have separate contributions to word duration, despite their substantial correlations. Moreover, content- and function-word durations are affected differently by their frequency and predic ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In a regression study of conversational speech, we show that frequency, contextual predictability and repetition have separate contributions to word duration, despite their substantial correlations. Moreover, content- and function-word durations are affected differently by their frequency and predictability. Content words are shorter when more frequent, and shorter when repeated, while function words are not so affected. Function words have shorter pronunciations, after controlling for frequency and predictability. While both content and function words are strongly affected by predictability from the word following them, sensitivity to predictability from the preceding word is largely limited to very frequent function words. The results support the view that content and function words are accessed differently in production. We suggest a lexical-access-based model of our results, in which frequency or repetition lead to shorter or longer word durations by causing faster or slower lexical access, mediated by a general mechanism that coordinates the pace of higher-level planning and the execution of the articulatory plan.

