Results 11 -
16 of
16
Large vocabulary continuous speech recognition using linguistic features and constraints
, 2005
"... Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categor ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categories. One deals with the the ordering of words (syntax) and organization of their meanings (semantics, pragmatics, etc). The other governs how speech signals are related to words, a process often termed as “lexical access”. This thesis studies the Huttenlocher-Zue lexical access model, its implementation in a modern probabilistic speech recognition framework and its application to continuous speech from an open vocabulary. The Huttenlocher-Zue model advocates a two-pass lexical access paradigm. In the first pass, the lexicon is effectively pruned using broad linguistic constraints. In the original Huttenlocher-Zue model, the authors had proposed six linguistic features motivated by the manner of pronunciation.
Differentiated Harmonic Feature Analysis on Music Information Retrieval For Instrument Recognition
"... Abstract—There are lots of different music recommendation systems to help users to get relevant items among the enormous amount of digital music items on all different purposes. The quality of a recommendation system is critically based on what the system can understand about the music objects. This ..."
Abstract
- Add to MetaCart
Abstract—There are lots of different music recommendation systems to help users to get relevant items among the enormous amount of digital music items on all different purposes. The quality of a recommendation system is critically based on what the system can understand about the music objects. This has stimulated the research on automatic music information retrieval. Numerous approaches have been proposed for instrument recognition in terms of feature extraction and selection. Moving Picture Experts Group (MPEG) standardized a set of features based on the digital audio content data for the purpose of interpretation of the information meaning. Features investigated so far are intended to describe a frame window, the whole sound segment, or arbitrarily split bins of the sound segment. Sound vibration in the transient state is known to be significantly different from the one in the quasi-steady state, while information in the transient state is important for instrument recognition by human. However, the boundary of the transient state and feature behaviors in the transient state has been barely investigated. We proposed a differentiated analysis to harmonic features with transient duration boundary detection by instantaneous fundamental frequency in each frame.
Feature-based Pronunciation Modeling for Speech Recognition
- In Proc. HLT/NAACL
, 2004
"... We present an approach to pronunciation modeling in which the evolution of multiple linguistic feature streams is explicitly represented. ..."
Abstract
- Add to MetaCart
We present an approach to pronunciation modeling in which the evolution of multiple linguistic feature streams is explicitly represented.
SVM-HMM LANDMARK BASED SPEECH RECOGNITION
"... Support vector machines (SVMs) are trained to detect acoustic-phonetic landmarks, and to identify both the manner and place of articulation of the phones producing each landmark with high accuracy. The discriminant outputs of these SVMs are used as input features for a standard HMM based ASR system. ..."
Abstract
- Add to MetaCart
Support vector machines (SVMs) are trained to detect acoustic-phonetic landmarks, and to identify both the manner and place of articulation of the phones producing each landmark with high accuracy. The discriminant outputs of these SVMs are used as input features for a standard HMM based ASR system. There is a significant improvement in both the phone and word recognition accuracy when using these SVM discriminant features when compared to the phone and word recognition accuracy of an MFCC based recognizer.
Blind Signal Separation (BSS) and Blind Audio Source
"... Pitch and timbre detection methods applicable to monophonic digital signals are common. Conversely, successful detection of multiple pitches and timbres in polyphonic time-invariant music signals remains a challenge. A review of these methods, sometimes called ”Blind Signal Separation”, is presented ..."
Abstract
- Add to MetaCart
Pitch and timbre detection methods applicable to monophonic digital signals are common. Conversely, successful detection of multiple pitches and timbres in polyphonic time-invariant music signals remains a challenge. A review of these methods, sometimes called ”Blind Signal Separation”, is presented in this paper. We analyze how musically trained human listeners overcome resonance, noise, and overlapping signals to identify and isolate what instruments are playing and then what pitch each instrument is playing. The part of the instrument and pitch recognition system, presented in this paper, responsible for identifying the dominant instrument from a base signal uses temporal features proposed by Wieczorkowska [1] in addition to the standard 11 MPEG7 features. After retrieving a semantical match for that dominant instrument from the database, it creates a resulting foreign set of features to form a new synthetic basen signal which no longer bears the previously extracted dominant sound. The system may repeat this process until all recognizable dominant instruments are accounted for in the segment. The

