Results 1 - 10
of
20
The motor theory of speech perception revised
- Cognition
, 1985
"... A motor theory of speech perception, initially proposed to account for results of early experiments with synthetic speech, is now extensively revised to accommodate recent findings, and to relate the assumptions of the theory to those that might be made about other perceptual modes. According to the ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
A motor theory of speech perception, initially proposed to account for results of early experiments with synthetic speech, is now extensively revised to accommodate recent findings, and to relate the assumptions of the theory to those that might be made about other perceptual modes. According to the revised theory, phonetic information is perceived in a biologically distinct system, a ‘module ’ specialized to detect the intended gestures of the speaker that are the basis for phonetic categories. Built into the structure of this module is the unique but lawful relationship between the gestures and the acoustic patterns in which they are variously overlapped. In consequence, the module causes perception of phonetic structure without translation from preliminary auditory impressions. Thus, it is comparable to such other modules as the one that enables an animal to localize sound. Peculiar to the phonetic module are the relation between perception and production it incorporates and the fact that it must compete with other modules for the same stimulus variations.
Networks and Places
- Social Relations in the Urban Setting. (with
, 1977
"... cues of voiced and voiceless plosives for determining ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
cues of voiced and voiceless plosives for determining
The Atoms of Phonological Representation: Gestures, Coordination and Perceptual Features in Consonant Cluster Phonotactics
- Johns Hopkins University
, 2003
"... The central goal of this dissertation is to investigate the roles and interaction of articulatory, perceptual, and temporal elements in the phonological component of the grammar. This inquiry extends both to the input representations that are submitted to a phonological grammar, and to the constrain ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The central goal of this dissertation is to investigate the roles and interaction of articulatory, perceptual, and temporal elements in the phonological component of the grammar. This inquiry extends both to the input representations that are submitted to a phonological grammar, and to the constraints in the grammar. In order to adequately account for both production data and data from language typology, two elements must be integrated into the phonological component alongside articulatory gestures: perceptual features, which play an important role in determining phonotactic patterns, and gestural coordination, which establishes whether and how adjacent gestures are related to one another. This dissertation reports three experiments on the production of word-initial consonant clusters; such clusters are an appropriate environment for investigating how perception, articulation, and coordination interact in the phonology. The first experiment is an acoustic study of the production by native English speakers of Czech-possible consonant clusters (e.g. fkale, zbano, vnodi). Results show that speakers are more
Can automatic speech recognition learn more from human speech perception
- Trends in Speech Technology
, 2005
"... 1 Although a great deal of progress has been made during the last two decades in automatic speech recognition (ASR), the performance of these ASR systems, as measured by word recognition and concept understanding error rates, is still much worse than that achieved by humans, even for carefully read ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
1 Although a great deal of progress has been made during the last two decades in automatic speech recognition (ASR), the performance of these ASR systems, as measured by word recognition and concept understanding error rates, is still much worse than that achieved by humans, even for carefully read and articulated speech in quiet conditions. This performance gap (between machines and humans) increases even more in noisy conditions and for conversational speech. Steadily increasing computational speed and computer memory tend to impose fewer and fewer constraints on the types and the amount of recognition processing that can be brought to bear on a particular recognition task. In spite of the increased computation and memory, the state-of-the-art technology in automatic speech recognition appears to have reached a plateau in the past few years. New techniques and principles need to be invented or applied in order to substantially reduce the current performance gap in speech recognition between humans and machines. This paper presents some ideas intended to stimulate further research on applying knowledge and principles derived from studies of human speech perception to automatic speech recognition. Although the mechanisms of human speech perception (HSP) are not fully understood, some findings from neuroscience, physiology, cognitive science and psychology could potentially lead to new understanding and thereby stimulate the development of new techniques and architectures for automatic speech recognition that, eventually, will bridge and reduce the performance gap between machines and humans.
Auditory Features For Human Communication Of Stop Consonants Under Fullband And Low-Pass Conditions
"... A set of auditorily-formulated features for PLACE discrimination in stop consonants, uncovered in extensive experiments with natural and edited sounds, are now being modeled using fuzzy logic and being applied to large databases of monosyllabic and spelled letters speech sounds, in various languages ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A set of auditorily-formulated features for PLACE discrimination in stop consonants, uncovered in extensive experiments with natural and edited sounds, are now being modeled using fuzzy logic and being applied to large databases of monosyllabic and spelled letters speech sounds, in various languages, in full-band and low-pass conditions. The rationale is that any valid model of human communication should replicate the human listener "feats" of very good (albeit not perfect) discrimination of stop PLACE even from speakers of different languages, and of "graceful degradation" when faced with markedly low-pass filtered sounds (e.g., telephone-like). This paper reports mainly about fuzzy-logical models, expressing known auditory phenomena, that evaluate the high-frequency content of the burst+aspiration segment in stop consonants, and provides a powerful cue for discrimination of DENTAL consonants. This evaluation is robust to mild variations of the frequency response curve such as those ...
P. Perrier Control and representations in speech production
"... In this paper the issue of the nature of the representations of the speech production task in the speaker's brain is addressed in a production-perception interaction framework. Since speech is produced to be perceived, it is hypothesized that its production is associated for the speaker with the gen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper the issue of the nature of the representations of the speech production task in the speaker's brain is addressed in a production-perception interaction framework. Since speech is produced to be perceived, it is hypothesized that its production is associated for the speaker with the generation of specific physical characteristics that are for the listeners the objects of speech perception. Hence, in the first part of the paper, four reference theories of speech perception are presented, in order to guide and to constrain the search for possible correlates of the speech production task in the physical space: the Acoustic Invariance Theory, the Adaptive Variability Theory, the Motor Theory and the Direct-Realist Theory. Possible interpretations of these theories in terms of representations of the speech production task are proposed and analyzed. In a second part, a few selected experimental studies are presented, which shed some light on this issue. In the conclusion, on the basis of the joint analysis of theoretical and experimental aspects presented in the paper, it is proposed that representations of the speech production task are multimodal, and that a hierarchy exists among the different modalities, the acoustic modality having the highest level of priority. It is also suggested that these representations are not associated with invariant characteristics, but with regions of the acoustic, orosensory and motor control spaces. 1.
THE INDETERMINACY/ATTESTATION MODEL OF METATHESIS
"... This paper addresses three key observations relating to crosslinguistic patterns of metathesis. First, the order of sounds resulting from metathesis can differ from language to language such that a similar combination of sounds can be realized in one order in one language, but in the reverse order i ..."
Abstract
- Add to MetaCart
This paper addresses three key observations relating to crosslinguistic patterns of metathesis. First, the order of sounds resulting from metathesis can differ from language to language such that a similar combination of sounds can be realized in one order in one language, but in the reverse order in another language. Second, for some sound combinations, only one order is commonly attested as the result of metathesis, while for other combinations, either order can be observed. Third, the acoustic/auditory cues to the identification of the sequence resulting from metathesis are often better than those of the expected, yet nonoccurring, order. These patterns receive a straightforward explanation when we consider the phonetic nature of the sounds involved as well as the speaker/hearer’s knowledge of native sound patterns and their frequency of occurrence. Neither factor alone is sufficient to provide a predictive account of metathesis. This study shows, however, that by taking into account both factors, we are able to understand why certain sound combinations tend to undergo metathesis, why others are common results of metathesis, why patterns of metathesis differ across languages, and, importantly, why metathesis occurs in the first place.* 1. INTRODUCTION. Metathesis
Auditory Features Underlying Cross-Language Human Capabilities In Stop Consonant Discrimination
"... For some phonemic distinctions human listeners exhibit a marked cross-language capability, in that they are capable of highly correct classification in relation to sounds (like CVs or VCVs) uttered by speakers of another language. This is particularly true regarding distinctions that are perceived i ..."
Abstract
- Add to MetaCart
For some phonemic distinctions human listeners exhibit a marked cross-language capability, in that they are capable of highly correct classification in relation to sounds (like CVs or VCVs) uttered by speakers of another language. This is particularly true regarding distinctions that are perceived in a more categorical fashion, like that of 3-way PLACE discrimination in stop consonants. It is plausible that the reason for this is a mostly common (across languages) auditory basis for human communication of this discrimination. Also, human communication of this discrimination is notably impervious to non-drastic variations in the frequencytransfer curve, which suggests that the relevant auditory features must have some inherent insensitivity to these variations. Models for two specialized auditory cells (onset cells with wide receptive fields, which can detect weak onsets synchronized across frequency, and sequence cells which detect frequency-ascending sequences composed of two onsets)...
Articulatory and Perceptual Aspects of Fricative-Stop Coarticulation: A Pilot Study
, 1996
"... This paper is concerned with the influence of a preceding fricative on the production and perception of a stop consonant. In a well-known series of experiments (Mann & Repp, 1981; Repp & Mann, 1981, 1982), Mann and Repp showed that, when preceded by a fricative, an ambiguous stop acoustically halfwa ..."
Abstract
- Add to MetaCart
This paper is concerned with the influence of a preceding fricative on the production and perception of a stop consonant. In a well-known series of experiments (Mann & Repp, 1981; Repp & Mann, 1981, 1982), Mann and Repp showed that, when preceded by a fricative, an ambiguous stop acoustically halfway between /W/ and /N/ is identified differently depending on the place of articulation of the fricative. Specifically, listeners tend to identify stops more frequently as velars following /V/ than following /6/. This effect was shown to occur regardless of the presence/absence of a syllable boundary between the two consonants. It also appeared to decrease gradually in magnitude as a silence of increasing duration was inserted after the fricative, although it was still significant with silent gaps as long as 375 ms. As the listeners' responses were affected by both the specific acoustic structure of the fricative and its perceived category, it was suggested that the perceptual mechanism responsible for this effect was partly continuous and partly categorical, i.e. operated both before and after phonetic categorisation.

