Results 1 - 10
of
50
A hierarchical duration model for speech recognition based on the ANGIE framework
- in Proc. Eurospeech '97
, 1999
"... This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonologic ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from $9.3 to 91.6 (FOM) has resulted. 1999 Elsevier Science B.V. All rights reserved.
Leading up the lexical garden-path: Segmentation and ambiguity in spoken word recognition
- Journal of Experimental Psychology: Human Perception and Performance
, 2002
"... Two gating studies, a forced-choice identification study and 2 series of cross-modal repetition priming experiments, traced the time course of recognition of words with onset embeddings (captain) and short words in contexts that match (cap tucked) or mismatch (cap looking) with longer words. Results ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Two gating studies, a forced-choice identification study and 2 series of cross-modal repetition priming experiments, traced the time course of recognition of words with onset embeddings (captain) and short words in contexts that match (cap tucked) or mismatch (cap looking) with longer words. Results suggest that acoustic differences in embedded syllables assist the perceptual system in discriminating short words from the start of longer words. The ambiguity created by embedded words is therefore not as severe as predicted by models of spoken word recognition based on phonemic representations. These additional acoustic cues combine with post-offset information in identifying onset-embedded words in connected speech. An important problem in the perception of connected speech is segmentation: how listeners divide the speech stream into individual lexical units or words. Words in fluent speech are not separated by silence in the same way that printed words are divided by blank spaces, yet connected speech is perceived as a sequence of individual words. This perceptual experience clearly reflects acquired language-specific knowledge, because listeners do not have the
The Computational Processing of Intonational Prominence: A Functional Prosody Perspective
, 1997
"... Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two imp ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two important assumptions: first, there is an aspect of prominence interpretation that centrally concerns discourse processes, namely the discourse focusing nature of prominence; and second, the role of prominence in language processing in general, and discourse processing in particular, is not essentially separate from the processing of other grammatical, nonprosodic information. This thesis develops a computational theory of prominence interpretation by explaining how prominence serves as an inference cue in discourse processing. Prominence signals changes in the attentional status of entities in a discourse model, while nonprominence signals that the realized entities are already in discourse fo...
Against formal phonology
- Language
, 2005
"... Chomsky and Halle (1968) and many formal linguists rely on the notion of a universally available phonetic space defined in discrete time. This assumption plays a central role in phonological theory. Discreteness at the phonetic level guarantees the discreteness of all other levels of language. But d ..."
Abstract
-
Cited by 16 (10 self)
- Add to MetaCart
Chomsky and Halle (1968) and many formal linguists rely on the notion of a universally available phonetic space defined in discrete time. This assumption plays a central role in phonological theory. Discreteness at the phonetic level guarantees the discreteness of all other levels of language. But decades of phonetics research demonstrate that there exists no universal inventory of phonetic objects. We discuss three kinds of evidence: first, phonologies differ incommensurably. Second, some phonetic characteristics of languages depend on intrinsically temporal patterns, and, third, some linguistic sound categories within a language are different from each other despite a high degree of overlap that precludes distinctness. Linguistics has mistakenly presumed that speech can always be spelled with letter-like tokens. A variety of implications of these conclusions for research in phonology are discussed.* The generative paradigm of language description (Chomsky 1964, 1965, Chomsky & Halle 1968) has dominated linguistic thinking in the United States for many years. Its specific claims about the phonetic basis of linguistic analysis still provide the cornerstone of most linguistic research. Many criticisms have been raised against the phonetic claims of the Sound pattern of English (Chomsky & Halle 1968), some from early on
Neural network processing of natural language: I. Sensitivity to serial, temporal and abstract structure of language in the infant
, 2000
"... ..."
Temporal Properties of Spontaneous Speech -- A Syllable-Centric Perspective
"... Temporal properties associated with the speech signal are potentially important for understanding spoken language. Five hours of spontaneous American English dialogue material (from the SWITCHBOARD corpus) were hand-labeled and segmented at the phonetic-segment level; a fortyfive -minute subset was ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Temporal properties associated with the speech signal are potentially important for understanding spoken language. Five hours of spontaneous American English dialogue material (from the SWITCHBOARD corpus) were hand-labeled and segmented at the phonetic-segment level; a fortyfive -minute subset was also manually annotated (at the syllabic level) with respect to stress accent. Statistical analysis of the corpus indicates that much of the temporal variation observed at the syllabic and phonetic-segment levels can be accounted for in terms of two basic parameters: (1) stress-accent pattern and (2) position of the segment within the syllable. Segments are generally longest in heavily accented syllables and shortest in syllables without accent. However, the magnitude of accent's impact on duration varies as a function of syllable position. Duration of the nucleus is heavily affected by accent level (heavily accented nuclei are, on average, twice as long as their unaccented counterparts), while the duration of the onset is also significantly affected but to a lesser degree. In contrast, accent has relatively little impact on the duration of the coda. This pattern of durational variation is incommensurate with segmental models, but rather implies the importance of syllable structure (and stress accent) for understanding spoken language.
Modeling Segmental Duration In German Text-To-Speech Synthesis
, 1996
"... This paper reports on the construction of a model for segmental duration in German. The model predicts the durations of speech sounds in various textual, prosodic, and segmental contexts. It has system [18, 12]. The construction of the duration system was made efficient by the use of an interactive ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper reports on the construction of a model for segmental duration in German. The model predicts the durations of speech sounds in various textual, prosodic, and segmental contexts. It has system [18, 12]. The construction of the duration system was made efficient by the use of an interactive statistical analysis package that incorporates the approach outlined in [23]. The results are stored in tables in a format that can be directly interpreted by the TTS duration module. Tables are constructed in two phases: inferential-statistical analysis of the speech corpus, and parameter estimation. The overall correlation between observed and predicted segmental durations is .896.
Stochastic Suprasegmentals - Relationships between Redundancy, Prosodic Structure and Care of Articulation in Spontaneous Speech
, 2000
"... ..."
Characterisation of Rhythmic Patterns for Text-to-Speech Synthesis
, 1994
"... This article proposes an alternative rhythmic unit for the syllable: the inter-Perceptual Center group. This group is delimited by events which can be detected using only acoustic correlates [29]. The rhythmic patterns for French are described using this characterisation: we show that realisation of ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This article proposes an alternative rhythmic unit for the syllable: the inter-Perceptual Center group. This group is delimited by events which can be detected using only acoustic correlates [29]. The rhythmic patterns for French are described using this characterisation: we show that realisation of accents is gradual over the trailed accentual group and that this gradual lengthening is needed for perception. A model of repartition of the IPCG duration among its segmental constituents incorporating automatic generation of pauses (emergence and duration) according to speech rate is then described.
Pauses and the Temporal Structure of Speech
- In
, 1994
"... cted to lead to important further improvements in speech synthesis by rendering it even more "fluent," more "human-like," and probably also quite a bit more intelligible. 42 Zellner Endowing speech synthesis with prosodic parameters means that intonation, stress, syllabic length and speech rate ha ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
cted to lead to important further improvements in speech synthesis by rendering it even more "fluent," more "human-like," and probably also quite a bit more intelligible. 42 Zellner Endowing speech synthesis with prosodic parameters means that intonation, stress, syllabic length and speech rate have to be generated on the basis of textual material. It is therefore important to consider how temporal phenomena occur in human speech, and how they relate to the textual material from which they are generated. At the level of the acoustic;signal, high-level temporal parameters are translated not only into corresponding low-level durational variations, but also into modifications of fundamental frequency and intensity. A second consideration thus concerns the relationship between the temporal phenomena postulated at the prosodic level and the precise acoustic implementation of these phenomena. As will be seen later in the chapter, segment or syllable durations and pause phe

