Results 1 - 10
of
30
A Linguistically Constrained Model of Short-Term Memory for Nonwords
, 1996
"... this paper we present a linguistically constrained model of the learning and recall of unfamiliar words in verbal short-term memory. All the words a mature speaker knows were once new to them, but normal speakers, even very young children, can often repeat a nonword after a single exposure (Gatherco ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
this paper we present a linguistically constrained model of the learning and recall of unfamiliar words in verbal short-term memory. All the words a mature speaker knows were once new to them, but normal speakers, even very young children, can often repeat a nonword after a single exposure (Gathercole & Baddeley, 1989; Gathercole & Adams, 1993). The apparent simplicity of this task disguises what may be a rather complex system dedicated to the solution of a specific problem---the need to represent and recall serially ordered verbal stimuli. Spoken words are spread over time, so that there is no point at which all of the information to be retained is concurrently present. The serial structure of the stimulus (e.g., the order of phonemes in a syllable) is therefore central to the identity of the stimulus and must be retained. Once spoken, the word is no longer present in the environment, and cannot be reexamined at will (unlike, say, a typical visual stimulus). In order to repeat or rehearse a novel input, a single trial serial-order learning mechanism is needed. This mechanism must track the input in real-time and have produced a representation capable of supporting rehearsal by the time the stimulus finishes. It is proposed in the current work that this remarkable ability underlies the development of more long-term lexical-- phonological knowledge. As well as being of interest in its own right, the explication of this capacity is thus central to the understanding of language acquisition
Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction
, 2000
"... Recent technological advances have enabled human users to interact with comput-ers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities to control the computer such as voice, gesture, and force-feedback are emerging. Among these, voice and vision are two nat ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Recent technological advances have enabled human users to interact with comput-ers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities to control the computer such as voice, gesture, and force-feedback are emerging. Among these, voice and vision are two natural modalities in human-to-human communication. Automatic speech recognition (ASR) technology has matured enough to allow users to dictate to a word processor or operate the computer using voice commands. Computer vision techniques have enabled the computer to see. Interacting with comput-ers in these modalities is much more natural for people, and the progression is towards the kind of interaction between humans. Despite these advances, one necessary ingredi-ent for natural interaction is still missing–emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in some applications such as computer-aided learning or user-friendly on-line help. This thesis addresses the problem of detecting human emotional expressions by
Exploring Prosody in Interaction Control
, 2005
"... This paper investigates prosodic aspects of turn-taking in conversation with a view to improving the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor. It examines the relationship between interaction control, the communicative functi ..."
Abstract
-
Cited by 20 (14 self)
- Add to MetaCart
This paper investigates prosodic aspects of turn-taking in conversation with a view to improving the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor. It examines the relationship between interaction control, the communicative function of which is to regulate the flow of information between interlocutors, and its phonetic manifestation. Specifically, the listener's perception of such interaction control phenomena is modelled. Algorithms for automatic online extraction of prosodic phenomena liable to be relevant for interaction control, such as silent pauses and intonation patterns, are presented and evaluated in experiments using Swedish Map Task data. We show that the automatically extracted prosodic features can be used to avoid many of the places where current dialogue systems run the risk of interrupting their users, and also to identify suitable places to take the turn.
TalkBack: a conversational answering machine
, 2003
"... Current asynchronous voice messaging interfaces, like voicemail, fail to take advantage of our conversational skills. TalkBack restores conversational turn-taking to voicemail retrieval by dividing voice messages into smaller sections based on the most significant silent and filled pauses and pausin ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Current asynchronous voice messaging interfaces, like voicemail, fail to take advantage of our conversational skills. TalkBack restores conversational turn-taking to voicemail retrieval by dividing voice messages into smaller sections based on the most significant silent and filled pauses and pausing after each to record a response. The responses are composed into a reply, alternating with snippets of the original message for context. TalkBack is built into a digital picture frame; the recipient touches a picture of the caller to hear each segment of the message in turn. The minimal interface models synchronous interaction and facilitates asynchronous voice messaging. TalkBack can also present a voice-annotated slide show which it receives over the Internet.
Automatic Prosodic Prominence Detection in Speech Using Acoustic Features: an Unsupervised System
- In Proceedings of Eurospeech 2003
, 2003
"... This paper presents work in progress on the automatic detection of prosodic prominence in continuous speech. Prosodic prominence involves two different phonetic features: pitch accents, connected with fundamental frequency (F0) movements and syllable overall energy, and stress, which exhibits a stro ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents work in progress on the automatic detection of prosodic prominence in continuous speech. Prosodic prominence involves two different phonetic features: pitch accents, connected with fundamental frequency (F0) movements and syllable overall energy, and stress, which exhibits a strong correlation with syllable nuclei duration and mid-to-high-frequency emphasis. By measuring these acoustic parameters it is possible to build an automatic system capable of correctly identifying prominent syllables with an agreement, with human-tagged data, comparable with the inter-human agreement reported in the literature. This system does not require any training phase, additional information or annotation, it is not tailored to a specific set of data and can be easily adapted to different languages.
Authoring and Transcription Tools for Speech-Based Hypermedia Systems
- In Proceedings of 1991 Conference American Voice I/O Society
, 1991
"... Authoring is usually one of the most difficult parts in the design and implementation of hypertext and hypermedia systems. This problem is exacerbated if the data to be presented by the system is speech, rather than text or graphics, because of the slow and serial nature of speech. This paper provid ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Authoring is usually one of the most difficult parts in the design and implementation of hypertext and hypermedia systems. This problem is exacerbated if the data to be presented by the system is speech, rather than text or graphics, because of the slow and serial nature of speech. This paper provides an overview of speech-only hypermedia, discusses the difficulties associated with authoring databases for such a system, and explores a variety of techniques to assist in the authoring process. Speech-Only Hypermedia Since the introduction of Hypercard for the Macintosh, the ideas behind hypertext systems have become commonplace. The addition of graphics, audio, still images, or video to such systems is helping to create a wealth of new hypermedia applications, but few of these systems take advantage of voice input or output. To create an end-user application, the raw source material must be assembled and structured as part of "authoring" process. For example, the authoring of a videodisc...
Prosodic Prominence Detection in Speech
, 2003
"... This paper presents work in progress on the automatic detection of prosodic prominence in continuous speech. Prosodic prominence involves two different phonetic features: pitch accents, connected with fundamental frequency (F0) movements and syllable overall energy, and stress, which exhibits a stro ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents work in progress on the automatic detection of prosodic prominence in continuous speech. Prosodic prominence involves two different phonetic features: pitch accents, connected with fundamental frequency (F0) movements and syllable overall energy, and stress, which exhibits a strong correlation with syllable nuclei duration and high-frequency emphasis. By measuring these acoustic parameters it is possible to build an automatic system capable of correctly identifying prominent syllables with an agreement with human-tagged data comparable with the inter-human agreement reported in the literature. These results were achieved without using any information apart from acoustic parameters.
Using Prosodic Features in Language Models for Meetings
"... Abstract. Prosody has been actively studied as an important knowledge source for speech recognition and understanding. In this paper, we are concerned with the question of exploiting prosody for language models to aid automatic speech recognition in the context of meetings. Using an automatic syllab ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Prosody has been actively studied as an important knowledge source for speech recognition and understanding. In this paper, we are concerned with the question of exploiting prosody for language models to aid automatic speech recognition in the context of meetings. Using an automatic syllable detection algorithm, the syllable-based prosodic features are extracted to form the prosodic representation for each word. Two modeling approaches are then investigated. One is based on a factored language model, which directly uses the prosodic representation and treats it as a ‘word’. Instead of direct association, the second approach provides a richer probabilistic structure within a hierarchical Bayesian framework by introducing an intermediate latent variable to represent similar prosodic patterns shared by groups of words. Fourfold cross-validation experiments on the ICSI Meeting Corpus show that exploiting prosody for language modeling can significantly reduce the perplexity, and also have marginal reductions in word error rate. 1
Speech Recognition Using Acoustic Landmarks and Binary Phonetic Feature Classifiers
, 2003
"... In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), i ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), is the inferior acoustic modeling of low level or phonetic level linguistic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal. But an acoustic phonetic system that carries out large ASR speech recognition tasks, for example, connected word or continuous speech recognition, does not exist. We propose a probabilistic and statistical framework for ASR based on the knowledge of acoustic phonetics for connected word ASR. The proposed system is based on the idea of representation of speech sounds by bundles of binary valued articulatory phonetic features. The probabilistic framework requires only binary classifiers of phonetic features and the knowledge based acoustic correlates of the features for the purpose of connected word speech recognition. We explore the use of Support Vector Machines (SVMs) for binary phonetic feature classification because of the favorable properties well suited to our recognition task that SVMs o#er. In the proposed method, probabilistic segmentation of speech is obtained using SVM based classifiers of manner phonetic features. The linguistically motivated landmarks obtained in each segmentation is used for classification of source and place phonetic features. Probabilistic segmentation paths are constrained using Finite State Automata (FSA) for isolated or connected word recognition. The proposed method could overcome the disadvantages ...

