Results 1 -
8 of
8
Further progress in meeting recognition: The ICSI-SRI spring 2005 speech-to-text evaluation system
- In Proceedings of the
, 2005
"... Abstract. We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversat ..."
Abstract
-
Cited by 24 (11 self)
- Add to MetaCart
Abstract. We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This year’s system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17 % relative in both MDM and IHM conditions compared to last year’s evaluation system. Results on lecture data are comparable to the best reported results for that task. 1
A.: Segmenting meetings into agenda items by extracting implicit supervision from human note-taking
- In Proc. of IUI 2006
, 2007
"... Splitting a meeting into segments such that each segment contains discussions on exactly one agenda item is useful for tasks such as retrieval and summarization of agenda item discussions. However, accurate topic segmentation of meetings is a difficult task. In this paper, we investigate the idea of ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Splitting a meeting into segments such that each segment contains discussions on exactly one agenda item is useful for tasks such as retrieval and summarization of agenda item discussions. However, accurate topic segmentation of meetings is a difficult task. In this paper, we investigate the idea of acquiring implicit supervision from human meeting participants to solve the segmentation problem. Specifically we have implemented and tested a note taking interface that gives value to users by helping them organize and retrieve their notes easily, but that also extracts a segmentation of the meeting based on note taking behavior. We show that the segmentation so obtained achieves a Pk value of 0.212 which improves upon an unsupervised baseline by 45 % relative, and compares favorably with a current state–of–the–art algorithm. Most importantly, we achieve this performance without any features or algorithms in the classic sense. ACM Classification: H5.2 [Information interfaces and presentation]:
RECOGNITION AND UNDERSTANDING OF MEETINGS THE AMI AND AMIDA PROJECTS
"... The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using multiple microphones and cameras; released a 100 hour annotated corpus of meetings; developed techniques fo ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The AMI and AMIDA projects are concerned with the recognition and interpretation of multiparty meetings. Within these projects we have: developed an infrastructure for recording meetings using multiple microphones and cameras; released a 100 hour annotated corpus of meetings; developed techniques for the recognition and interpretation of meetings based primarily on speech recognition and computer vision; and developed an evaluation framework at both component and system levels. In this paper we present an overview of these projects, with an emphasis on speech recognition and content extraction. Index Terms — Meetings; speech recognition; AMI corpus; summarization; topic segmentation; evaluation
The SRI-ICSI spring 2007 meeting and lecture recognition system
- Proc. NIST Rich Transcription Workshop, Springer Lecture Notes in Computer Science, 2007. 3.3 164
"... Abstract. We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant microphones, and various adjustments to the speech segmenter for close-talking microphones. Acoustic models were improved by the combined use of neuralnet-estimated phone posterior features, discriminative feature transforms trained with fMPE-MAP, and discriminative Gaussian estimation using MPE-MAP, as well as model adaptation specifically to nonnative and non-American speakers. The net effect of these enhancements was a 14-16 % relative error reduction on distant microphones, and a 16-17 % error reduction on close-talking microphones. Also, for the first time, we report results on a new “coffee break ” meeting genre, and on a new NIST metric designed to evaluate combined speech diarization and recognition. 1
The ICSI-SRI spring 2006 meeting recognition system
- in Proceedings of the Rich Transcription 2006 Spring Meeting Recognition Evaluation
, 2006
"... Abstract. We describe the development of the ICSI-SRI speech recognition ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. We describe the development of the ICSI-SRI speech recognition
You Are What You Say: Using Meeting Participants ’ Speech to Detect their Roles and Expertise
"... Our goal is to automatically detect the functional roles that meeting participants play, as well as the expertise they bring to meetings. To perform this task, we build decision tree classifiers that use a combination of simple speech features (speech lengths and spoken keywords) extracted from the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Our goal is to automatically detect the functional roles that meeting participants play, as well as the expertise they bring to meetings. To perform this task, we build decision tree classifiers that use a combination of simple speech features (speech lengths and spoken keywords) extracted from the participants ’ speech in meetings. We show that this algorithm results in a role detection accuracy of 83 % on unseen test data, where the random baseline is 33.3%. We also introduce a simple aggregation mechanism that combines evidence of the participants ’ expertise from multiple meetings. We show that this aggregation mechanism improves the role detection accuracy from 66.7 % (when aggregating over a single meeting) to 83% (when aggregating over 5 meetings). 1
ABSTRACT Minimal-Impact Audio-Based Personal Archives
"... Collecting and storing continuous personal archives has become cheap and easy, but we are still far from creating a useful, ubiquitous memory aid. We view the inconvenience to the user of being ‘instrumented ’ as one of the key barriers to the broader development and adoption of these technologies. ..."
Abstract
- Add to MetaCart
Collecting and storing continuous personal archives has become cheap and easy, but we are still far from creating a useful, ubiquitous memory aid. We view the inconvenience to the user of being ‘instrumented ’ as one of the key barriers to the broader development and adoption of these technologies. Audio-only recordings, however, can have minimal impact, requiring only that a device the size and weight of a cellphone be carried somewhere on the person. We have conducted some small-scale experiments on collecting continuous personal recordings of this kind, and investigating how they can be automatically analyzed and indexed, visualized, and correlated with other minimal-impact, opportunistic data feeds (such as online calendars and digital photo collections). We describe our unsupervised segmentation and clustering experiments in which we can achieve good agreement with hand-marked environment/situation labels. We also discuss some of the broader issues raised by this kind of work including privacy concerns, and describe our future plans to address these and other questions.
DETERMINING HIGH LEVEL DIALOG STRUCTURE WITHOUT REQUIRING THE WORDS
"... The potentially enormous audio resources now available to both organizations, and on the Internet, present a serious challenge to audio browsing technology. In this paper we outline a set of techniques that can be used to determine high level dialog structure without the requirement of resource inte ..."
Abstract
- Add to MetaCart
The potentially enormous audio resources now available to both organizations, and on the Internet, present a serious challenge to audio browsing technology. In this paper we outline a set of techniques that can be used to determine high level dialog structure without the requirement of resource intensive, accent dependent, automatic speech recognition (ASR) technology. Using syllable finding algorithms based on band pass energy together with prosodic feature extraction, we show that a sub-lexical approach to prosodic analysis can out-perform results based on ASR and even those based on a word alignment which requires a complete transcription. We consider how these techniques could be integrated into ASR technology and suggest a framework for extending this type of sublexical prosodic analysis. 1.

