Results 1 - 10
of
10
Fast Transcription of Unstructured Audio Recordings
"... We introduce a new method for human-machine collaborative speech transcription that is significantly faster than existing transcription methods. In this approach, automatic audio processing algorithms are used to robustly detect speech in audio recordings and split speech into short, easy to transcr ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
We introduce a new method for human-machine collaborative speech transcription that is significantly faster than existing transcription methods. In this approach, automatic audio processing algorithms are used to robustly detect speech in audio recordings and split speech into short, easy to transcribe segments. Sequences of speech segments are loaded into a transcription interface that enables a human transcriber to simply listen and type, obviating the need for manually finding and segmenting speech or explicitly controlling audio playback. As a result, playback stays synchronized to the transcriber’s speed of transcription. In evaluations using naturalistic audio recordings made in everyday home situations, the new method is up to 6 times faster than other popular transcription tools while preserving transcription quality. Index Terms: speech transcription, speech corpora 1.
New Horizons in the Study of Child Language Acquisition �
"... Naturalistic longitudinal recordings of child development promise to reveal fresh perspectives on fundamental questions of language acquisition. In a pilot effort, we have recorded 230,000 hours of audio-video recordings spanning the first three years of one child’s life at home. To study a corpus o ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Naturalistic longitudinal recordings of child development promise to reveal fresh perspectives on fundamental questions of language acquisition. In a pilot effort, we have recorded 230,000 hours of audio-video recordings spanning the first three years of one child’s life at home. To study a corpus of this scale and richness, current methods of developmental cognitive science are inadequate. We are developing new methods for data analysis and interpretation that combine pattern recognition algorithms with interactive user interfaces and data visualization. Preliminary speech analysis reveals surprising levels of linguistic fine-tuning by caregivers that may provide crucial support for word learning. Ongoing analyses of the corpus aim to model detailed aspects of the child’s language development as a function of learning mechanisms combined with lifetime experience. Plans to collect similar corpora from more children based on a transportable recording system are underway. Index Terms: language acquisition, rich longitudinal data, human-machine collaborative analysis, computational models
Evaluating Video Visualizations of Human Behavior
- Conference on Human Factors in Computing Systems, ACM CHI
, 2011
"... Previously, we presented Viz-A-Vis, a VIsualiZation of Activity through computer VISion [17]. Viz-A-Vis visualizes behavior as aggregate motion over observation space. In this paper, we present two complementary user studies of Viz-A-Vis measuring its performance and discovery affordances. First, we ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Previously, we presented Viz-A-Vis, a VIsualiZation of Activity through computer VISion [17]. Viz-A-Vis visualizes behavior as aggregate motion over observation space. In this paper, we present two complementary user studies of Viz-A-Vis measuring its performance and discovery affordances. First, we present a controlled user study aimed at comparatively measuring behavioral analysis preference and performance for observation and search tasks. Second, we describe a study with architects measuring discovery affordances and potential impacts on their work practices. We conclude: 1) Viz-A-Vis significantly reduced search time; and 2) it increased the number and quality of insightful discoveries.
LANGUAGE MODEL PARAMETER ESTIMATION USING USER TRANSCRIPTIONS
"... In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypothes ..."
Abstract
- Add to MetaCart
In limited data domains, many effective language modeling techniques construct models with parameters to be estimated on an in-domain development set. However, in some domains, no such data exist beyond the unlabeled test corpus. In this work, we explore the iterative use of the recognition hypotheses for unsupervised parameter estimation. We also evaluate the effectiveness of supervised adaptation using varying amounts of user-provided transcripts of utterances selected via multiple strategies. While unsupervised adaptation obtains 80 % of the potential error reductions, it is outperformed by using only 300 words of user transcription. By transcribing the lowest confidence utterances first, we further obtain an effective word error rate reduction of 0.6%. Index Terms — speech recognition, language modeling, adaptation
CorpVis: An Online Emotional Speech Corpora Visualisation Interface
"... Abstract. Our research in emotional speech analysis has led to the construction of several dedicated high quality, online corpora of natural emotional speech assets. The requirements for querying, retrieval and organization of assets based on both their metadata descriptors and their analysis data l ..."
Abstract
- Add to MetaCart
Abstract. Our research in emotional speech analysis has led to the construction of several dedicated high quality, online corpora of natural emotional speech assets. The requirements for querying, retrieval and organization of assets based on both their metadata descriptors and their analysis data led to the construction of a suitable interface for data visualization and corpus management. The CorpVis interface is intended to assist collaborative work between several speech research groups working with us in this area, allowing online collaboration and distribution of assets to be performed. This paper details the current CorpVis interface into our corpora, and the work performed to achieve this.
eNTERFACE’10 Complete Project Proposal CoMediAnnotate: an usable multimodal annotation framework
"... Participating to the numediart research program on Digital Art Technologies ..."
Abstract
- Add to MetaCart
Participating to the numediart research program on Digital Art Technologies
(2)
"... This project aims at improving the user experience regarding multimedia content annotation. We evaluated and compared current timeline-based annotation tools, so as to elicit user requirements. We address two issues: 1) adapting the user interface, by supporting more input modalities through a rapid ..."
Abstract
- Add to MetaCart
This project aims at improving the user experience regarding multimedia content annotation. We evaluated and compared current timeline-based annotation tools, so as to elicit user requirements. We address two issues: 1) adapting the user interface, by supporting more input modalities through a rapid prototyping tool and by offering alternative visualization techniques of temporal signals; and 2) covering more steps of the annotation workflow besides the task of annotation itself: notably recording multimodal signals. We developed input devices components for the OpenInterface (OI) platform for rapid prototyping of multimodal interfaces: multitouch screen, jog wheels and pen-based solutions. We modified an annotation tool created with the Smart Sensor Integration (SSI) toolkit and componentized it in OI so as to bind its controls to different input devices. We produced mockups sketches towards a new design of an improved user interface for multimedia content annotation, and started developing a rough prototype using the Processing Development Environment. Our solution allows to produce several prototypes by varying the interaction pipeline: changing input modalities and using either the initial GUI of the annotation tool, or the newly-designed one. We target usability testing to validate our solution and determine which input modalities combination best suits given use cases.
H.5.1 Information Interfaces and Presentation: Multimedia
"... For mixed-initiative multimedia annotation systems an effective dialogue between the system and user is critical. In order to inform the development of such dialogue a clear insight into the impact of interruptions upon the perceptions of the user is required. We present preliminary results of an in ..."
Abstract
- Add to MetaCart
For mixed-initiative multimedia annotation systems an effective dialogue between the system and user is critical. In order to inform the development of such dialogue a clear insight into the impact of interruptions upon the perceptions of the user is required. We present preliminary results of an investigation into interruptions in the form of queries to the user. We show that a user can perceive differences between trivial and important queries. Whether a query is shown in or out of context, or at some opportune time, is also shown to have an impact on user perception of the system. Author Keywords Multimedia annotation, interruptions
VizKid: A Behavior Capture and Visualization System of Adult-Child Interaction
"... Abstract. We present VizKid, a capture and visualization system for supporting the analysis of social interactions between two individuals. The development of this system is motivated by the need for objective measures of social approach and avoidance behaviors of children with autism. VizKid visual ..."
Abstract
- Add to MetaCart
Abstract. We present VizKid, a capture and visualization system for supporting the analysis of social interactions between two individuals. The development of this system is motivated by the need for objective measures of social approach and avoidance behaviors of children with autism. VizKid visualizes the position and orientation of an adult and a child as they interact with one another over an extended period of time. We report on the design of VizKid and its rationale.
Program in Media arts and SciencesThe Birth of a Word
, 2013
"... A hallmark of a child’s first two years of life is their entry into language, from first productive word use around 12 months of age to the emergence of combinatorial speech in their second year. What is the nature of early language development and how is it shaped by everyday experience? This work ..."
Abstract
- Add to MetaCart
A hallmark of a child’s first two years of life is their entry into language, from first productive word use around 12 months of age to the emergence of combinatorial speech in their second year. What is the nature of early language development and how is it shaped by everyday experience? This work builds from the ground up to study early word learning, characterizing vocabulary growth and its relation to the child’s environment. Our study is guided by the idea that the natural activities and social structures of daily life provide helpful learning constraints. We study this through analysis of the largest-ever corpus of one child’s everyday experience at home. Through the Human Speechome Project, the home of a family with a young child was outfitted with a custom audio-video recording system, capturing more than 200,000 hours of audio and video of daily life from birth to age three. The annotated subset of this data spans the child’s 9-24 month age range and contains more than 8 million words of transcribed speech, constituting a detailed record of both the child’s input and linguistic development. Such a comprehensive, naturalistic dataset presents new research opportunities but also requires new analysis approaches – questions must be operationalized to leverage the full scale of the data. We

