Results 11 - 20
of
20
Abstract Articulatory-feature-based confidence measures
"... Confidence measures are computed to estimate the certainty that target acoustic units are spoken in specific speech segments. They are applied in tasks such as keyword verification or utterance verification. Because many of the confidence measures use the same set of models and features as in recogn ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Confidence measures are computed to estimate the certainty that target acoustic units are spoken in specific speech segments. They are applied in tasks such as keyword verification or utterance verification. Because many of the confidence measures use the same set of models and features as in recognition, the resulting scores may not provide an independent measure of reliability. In this paper, we propose two articulatory feature (AF) based phoneme confidence measures that estimate the acoustic reliability based on the match in AF properties. While acoustic-based features, such as Mel-frequency cepstral coefficients (MFCC), are widely used in speech processing, some recent works have focus on linguistically based features, such as the articulatory features that relate directly to the human articulatory process which may better capture speech characteristics. The articulatory features can either replace or complement the acoustic-based features in speech processing. The proposed AF-based measures in this paper were evaluated, in comparison and in combination, with the HMM-based scores on phoneme and keyword verification tasks using childrenÕs speech collected for a computer-based English pronunciation learning project. To fully evaluate their usefulness, the proposed measures and combinations were evaluated on both native and non-native data; and under field test conditions that mis-matches with the training condition. The experimental results show that under the different environments, combinations of the AF scores with the HMM-based
AUTOMATIC SPEECH RECOGNITION AND INTRINSIC SPEECH VARIATION
"... This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect ..."
Abstract
- Add to MetaCart
This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect the different levels of the ASR processing chain. For different sources of speech variation, the paper summarizes the current knowledge and highlights specific feature extraction or modeling weaknesses and current trends. 1.
PLASER: Pronunciation Learning via Automatic Speech Recognition
- Proc. HLT-NAACL 2003 Workshop on Building Educational Applications using Natural Language Processing
, 2003
"... PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach correct pronunciation and not to assess a student's overall pronunciation quality. Major challenges rela ..."
Abstract
- Add to MetaCart
PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach correct pronunciation and not to assess a student's overall pronunciation quality. Major challenges related to speech recognition technology include: allowance for non-native accent, reliable and corrective feedbacks, and visualization of errors.
ABSTRACT � ac ( ) ( O |p, Λ)
"... Application of linguistic knowledge of language transfer to automatic speech recognition (ASR) technology can enhance mispronunciation detection performance in Computer-Aided Pronunciation Training (CAPT). This is achieved by pinpointing salient pronunciation errors made by second language learners. ..."
Abstract
- Add to MetaCart
Application of linguistic knowledge of language transfer to automatic speech recognition (ASR) technology can enhance mispronunciation detection performance in Computer-Aided Pronunciation Training (CAPT). This is achieved by pinpointing salient pronunciation errors made by second language learners. In this work, we propose to apply decision fusion for further improvement in mispronunciation detection performance. Detection decision from the linguistically-motivated detection, which applies language transfer knowledge, is used as the basis. Back off to posterior probability based pronunciation scoring with phoneme-dependent thresholds is employed when the basis is “less-reliable”. Fusion can help combat problems such as incomplete coverage of linguistic knowledge as well as the imperfection of acoustic models in ASR. Our fusion strategy can maintain the diagnosis capability of the linguistically-motivated
Speech technology for language tutoring
"... Language learners are known to perform best in one-on-one interactive situations in which they receive optimal corrective feedback. However, one-on-one tutoring by trained language instructors is costly and therefore not feasible for the majority of language learners. This particularly applies to or ..."
Abstract
- Add to MetaCart
Language learners are known to perform best in one-on-one interactive situations in which they receive optimal corrective feedback. However, one-on-one tutoring by trained language instructors is costly and therefore not feasible for the majority of language learners. This particularly applies to oral proficiency, which requires intensive tutoring. Computer Assisted Language Learning (CALL) systems that make use of Automatic Speech Recognition (ASR) seem to offer new perspectives for language tutoring. In this paper we explain how.
DISCO: Development and Integration of Speech technology into Courseware for language learning
"... Recent research has shown that a properly designed ASRbased CALL system (Dutch-CAPT) was capable of detecting pronunciation errors and of providing comprehensible feedback on pronunciation. Since pronunciation is not the only skill required for speaking a second language, we explored the possibility ..."
Abstract
- Add to MetaCart
Recent research has shown that a properly designed ASRbased CALL system (Dutch-CAPT) was capable of detecting pronunciation errors and of providing comprehensible feedback on pronunciation. Since pronunciation is not the only skill required for speaking a second language, we explored the possibility of extending the Dutch-CAPT approach to other aspects of speaking proficiency like morphology and syntax. In this paper we explain how a number of errors in morphology and syntax that are common in spoken Dutch L2 could be addressed in an ASR-based CALL system. Finally, we present our new project in which corrective feedback will be provided on all three aspects of spoken proficiency: pronunciation, morphology and syntax. Index Terms: pronunciation training, CALL, ASR, error detection.
Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
"... This paper focuses on identifying, extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construc ..."
Abstract
- Add to MetaCart
This paper focuses on identifying, extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construct of communicative competence. Our goal is to find effective features, selected from a large set of features proposed previously and some new features designed in analogous ways from a syntactic complexity perspective that correlate well with human ratings of the same spoken responses, and to build automatic scoring models based on the most promising features by using machine learning methods. On human transcriptions with manually annotated clause and sentence boundaries, our best scoring model achieves an overall Pearson correlation with human rater scores of r=0.49 on an unseen test set, whereas correlations of models using sentence or clause boundaries from automated classifiers are around r=0.2. 1
Automatic Detection of Unnatural Word-Level Segments in Unit-Selection Speech Synthesis
"... Abstract—We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF ..."
Abstract
- Add to MetaCart
Abstract—We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF-IDF), and we report comparative results between different feature types and their combinations. We also compare three modeling methods based on Support Vector Machines (SVMs), Random Forests, and Conditional Random Fields (CRFs). We then discuss our results and present a comprehensive error analysis. I.
unknown title
"... processing. This chapter presents these and their combination, followed by some related technologies. 3.1 SPEECH PROCESSING Modern speech technology is based on digital signal processing, probabilistic theory and search algorithms. These techniques make it possible to perform significant data reduct ..."
Abstract
- Add to MetaCart
processing. This chapter presents these and their combination, followed by some related technologies. 3.1 SPEECH PROCESSING Modern speech technology is based on digital signal processing, probabilistic theory and search algorithms. These techniques make it possible to perform significant data reduction for coding and transmission of speech signals, speech synthesis and automatic recognition of speech, speaker or language. In this section the state-of-the-art is presented and related to realistic military applications. 3.1.1 Speech Coding When digital systems became available, it was obvious that the transmission of digital signals was more efficient than the transmission of analogue signals. If analogue signals are transmitted under adverse conditions, it is not easy to reconstruct the received signal, because the possible signal values are not known in advance. For digital signals discrete levels are used. This allows, within certain limits, the reconstruction of distorted signals. The first digital transmission systems were based on coding the waveform of the speech signal. This results in bit rates between 8000 to 64000 Bps (bits per second). The higher the bit rate the better the quality. Later, more advanced coding systems were used where basic properties of the speech were determined and encoded, resulting in a more efficient coding (bit rates

