Results 1 - 10
of
18
Further progress in meeting recognition: The ICSI-SRI spring 2005 speech-to-text evaluation system
- In Proceedings of the
, 2005
"... Abstract. We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversat ..."
Abstract
-
Cited by 24 (11 self)
- Add to MetaCart
Abstract. We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This year’s system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17 % relative in both MDM and IHM conditions compared to last year’s evaluation system. Results on lecture data are comparable to the best reported results for that task. 1
Recent innovations in speech-to-text transcription at sri-icsi-uw
- IEEE Transactions on Audio, Speech & Language Processing
, 2006
"... Abstract — We summarize recent progress in automatic speechto-text ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract — We summarize recent progress in automatic speechto-text
Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition
- IEEE Transactions on Audio, Speech, and Language Processing
, 2012
"... Abstract—We propose a novel context-dependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract—We propose a novel context-dependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8 % and 9.2 % (or relative error reduction of 16.0 % and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively. Index Terms—Speech recognition, deep belief network, context-dependent phone, LVSR, DNN-HMM, ANN-HMM I.
Leveraging sentence weights in a concept-based optimization framework for extractive meeting summarization
- in Proc. of Interspeech
, 2009
"... We adopt an unsupervised concept-based global optimization framework for extractive meeting summarization, where a subset of sentences is selected to cover as many important concepts as possible. We propose to leverage sentence importance weights in this model. Three ways are introduced to combine t ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We adopt an unsupervised concept-based global optimization framework for extractive meeting summarization, where a subset of sentences is selected to cover as many important concepts as possible. We propose to leverage sentence importance weights in this model. Three ways are introduced to combine the sentence weights within the concept-based optimization framework: selecting sentences for concept extraction, pruning unlikely candidate summary sentences, and using joint optimization of sentence and concept weights. Our experimental results on the ICSI meeting corpus show that our proposed methods can significantly improve the performance for both human transcripts and ASR output compared to the concept-based baseline approach, and this unsupervised approach achieves results comparable with those from supervised learning approaches presented in previous work. Index Terms: global optimization, sentence weights, meeting summarization
Building a highly accurate Mandarin speech recognizer
- In Proc. ASRU
, 2007
"... We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but similar accuracy for the purposes of cross adaptation and system combinati ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but similar accuracy for the purposes of cross adaptation and system combination. This paper elaborates on the main differences between the two systems, where one recognizer incorporates a discriminatively trained feature while the other utilizes a discriminative feature transformation. Additionally we present an improved acoustic segmentation algorithm and topicbased language model (LM) adaptation. Coupled with increased acoustic training data, we reduced the character error rate (CER) of the DARPA GALE 2006 evaluation set to 15.3 % from 18.4%. Index Terms — Mandarin, character error rates, multi-layer perceptrons, discriminative features, acoustic segmentation, LM adaptation, out-of-vocabulary. 1.
Evaluating the effectiveness of features and sampling in extractive meeting summarization
- in Proc. of IEEE Spoken Language Technology (SLT
, 2008
"... Feature-based approaches are widely used in the task of extractive meeting summarization. In this paper, we analyze and evaluate the effectiveness of different types of features using Forward Feature Selection in an SVM classifier. In addition to features used in prior studies, we introduce topic re ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Feature-based approaches are widely used in the task of extractive meeting summarization. In this paper, we analyze and evaluate the effectiveness of different types of features using Forward Feature Selection in an SVM classifier. In addition to features used in prior studies, we introduce topic related features and demonstrate that these features are helpful for meeting summarization. We also propose a new way to resample the sentences based on their salience scores for model training and testing. The experimental results on both the human transcripts and recognition output, evaluated by the ROUGE summarization metrics, show that feature selection and data resampling help improve the system performance. Index Terms — meeting summarization, forward feature selection, resampling, TFIDF 1.
TRAINING AND ADAPTING MLP FEATURES FOR ARABIC SPEECH RECOGNITION
, 2009
"... Features derived from Multi-Layer Perceptrons (MLPs) are becoming increasingly popular for speech recognition. This paper describes various schemes for applying these features to state-of-the-art Arabic speech recognition: the use of MLP-features for short-vowel modelling in graphemic systems; rapid ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Features derived from Multi-Layer Perceptrons (MLPs) are becoming increasingly popular for speech recognition. This paper describes various schemes for applying these features to state-of-the-art Arabic speech recognition: the use of MLP-features for short-vowel modelling in graphemic systems; rapid discriminative model training by standard PLP feature lattice re-use; and MLP feature adaptation using Linear Input Networks (LIN). The use of rapid training using MLP features and their use for short-vowel modelling and LIN adaptation gave reductions in word error rate. However significant improvements over explicit short-vowel modelling with standard multi-pass adaptation were not obtained, although they were useful in combination.
Advances in speech transcriptions at IBM under the DARPA EARS program
- IEEE Transactions on Audio, Speech, and Language Processing, accepted for publication
, 2000
"... Abstract—This paper describes the technical and system building advances made in IBM’s speech recognition technology over the course of the Defense Advanced Research Projects Agency (DARPA) Effective Affordable Reusable Speech-to-Text (EARS) program. At a technical level, these advances include the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—This paper describes the technical and system building advances made in IBM’s speech recognition technology over the course of the Defense Advanced Research Projects Agency (DARPA) Effective Affordable Reusable Speech-to-Text (EARS) program. At a technical level, these advances include the development of a new form of feature-based minimum phone error training (fMPE), the use of large-scale discriminatively trained full-covariance Gaussian models, the use of septaphone acoustic context in static decoding graphs, and improvements in basic decoding algorithms. At a system building level, the advances include a system architecture based on cross-adaptation and the incorporation of 2100 h of training data in every system component. We present results on English conversational telephony test data from the 2003 and 2004 NIST evaluations. The combination of technical advances and an order of magnitude more training data in 2004 reduced the error rate on the 2003 test set by approximately 21 % relative—from 20.4 % to 16.1%—over the most accurate system in the 2003 evaluation and produced the most accurate results on the 2004 test sets in every speed category. Index Terms—Discriminative training, Effective Affordable Reusable Speech-to-Text (EARS), finite-state transducer, full
Graph-based Submodular Selection for Extractive Summarization
"... Abstract—We propose a novel approach for unsupervised extractive summarization. Our approach builds a semantic graph for the document to be summarized. Summary extraction is then formulated as optimizing submodular functions defined on the semantic graph. The optimization is theoretically guaranteed ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract—We propose a novel approach for unsupervised extractive summarization. Our approach builds a semantic graph for the document to be summarized. Summary extraction is then formulated as optimizing submodular functions defined on the semantic graph. The optimization is theoretically guaranteed to be near-optimal under the framework of submodularity. Extensive experiments on the ICSI meeting summarization task on both human transcripts and automatic speech recognition (ASR) outputs show that the graph-based submodular selection approach consistently outperforms the maximum marginal relevance (MMR) approach, a concept-based approach using integer linear programming (ILP), and a recursive graph-based ranking algorithm using Google’s PageRank. I.
The ICSI-SRI spring 2006 meeting recognition system
- in Proceedings of the Rich Transcription 2006 Spring Meeting Recognition Evaluation
, 2006
"... Abstract. We describe the development of the ICSI-SRI speech recognition ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. We describe the development of the ICSI-SRI speech recognition

