Results 11 - 20
of
62
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Word spotting for historical documents
- INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION
, 2007
"... Searching and indexing historical handwritten collections is a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting ” clusters, an index that li ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Searching and indexing historical handwritten collections is a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting ” clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering
Off-Line Signature Verification By the Tracking of Feature and Stroke Positions
- PATTERN RECOGNITION
, 2003
"... There are inevitable variations in the signature patterns written by the same person. The variations can occur in the shape or in the relative positions of the characteristic features. In this paper, two methods are proposed to track the variations. Given the set of training signing samples, the fi ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
There are inevitable variations in the signature patterns written by the same person. The variations can occur in the shape or in the relative positions of the characteristic features. In this paper, two methods are proposed to track the variations. Given the set of training signing samples, the first method measures the positional variations of the one-dimensional projection profiles of the signature patterns; and the second method determines the variations in relative stroke positions in the two-dimension signature patterns. The statistics on these variations are determined from the training set. Given a signature to be verified, the positional displacements are determined and the authenticity is decided based on the statistics of the training samples. For the purpose of comparison, two existing methods proposed by other researchers were implemented and tested on the same database. Furthermore, two volunteers were recruited to perform the same verification task. Results show that the proposed system compares favorably with other methods and outperforms the volunteers.
Chroma binary similarity and local alignment applied to cover song identification
- IEEE Trans. on Audio, Speech, and Language Processing
, 2008
"... Abstract—We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the Music Infor ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Abstract—We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the Music Information Retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms. This article first presents a series of experiments carried out with two state-of-the-art methods for cover song identification. We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or Dynamic Time Warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best-performing ones are finally applied to the newly proposed method. Multiple evaluations of this one confirm a large increase in identification accuracy when comparing it with alternative state-of-the-art approaches.
A Probabilistic Model of Melodic Similarity
- in International Computer Music Conference (ICMC). 2002. Goteborg, Sweden: The International Computer Music Association
, 2002
"... Melodic similarity is an important concept for music databases, musicological studies, and interactive music systems. Dynamic programming is commonly used to compare melodies, often with a distance function based on pitch differences measured in semitones. This approach computes an "edit distance" a ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Melodic similarity is an important concept for music databases, musicological studies, and interactive music systems. Dynamic programming is commonly used to compare melodies, often with a distance function based on pitch differences measured in semitones. This approach computes an "edit distance" as a measure of melodic dissimilarity. The problem can also be viewed in probabilistic terms: What is the probability that a melody is a "mutation" of another melody, given a table of mutation probabilities? We explain this approach and demonstrate how it can be used to search a database of melodies. Our experiments show that the probabilistic model performs better than a typical "edit distance" comparison.
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
"... ..."
Lower-Bounding of Dynamic Time Warping Distances for Multivariate Time Series
"... A tight lower-bounding measure for dynamic time warping (DTW) distances for univariate time series was introduced in [Keogh 2002] and a proof for its lower-bounding property was presented. Here we extend these findings to allow lower-bounding of DTW distances for multivariate time series. 1. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
A tight lower-bounding measure for dynamic time warping (DTW) distances for univariate time series was introduced in [Keogh 2002] and a proof for its lower-bounding property was presented. Here we extend these findings to allow lower-bounding of DTW distances for multivariate time series. 1.
Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum
- IEEE Trans. Speech Audio Proc
, 2005
"... Abstract—The traditional minimum mean-square error (MMSE) estimator of the short-time spectral amplitude is based on the minimization of the Bayesian squared-error cost function. The squared-error cost function, however, is not subjectively meaningful in that it does not necessarily produce estimato ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract—The traditional minimum mean-square error (MMSE) estimator of the short-time spectral amplitude is based on the minimization of the Bayesian squared-error cost function. The squared-error cost function, however, is not subjectively meaningful in that it does not necessarily produce estimators that emphasize spectral peak (formants) information or estimators which take into account auditory masking effects. To overcome the shortcomings of the MMSE estimator, we propose in this paper Bayesian estimators of the short-time spectral magnitude of speech based on perceptually motivated cost functions. In particular, we use variants of speech distortion measures, such as the Itakura–Saito and weighted likelihood-ratio distortion measures, which have been used successfully in speech recognition. Three classes of Bayesian estimators of the speech magnitude spectrum are derived. The first class of estimators emphasizes spectral peak information, the second class uses a weighted-Euclidean cost function that implicitly takes into account auditory masking effects, and the third class of estimators is designed to penalize spectral attenuation. Of the three classes of Bayesian estimators, the estimators that implicitly take into account auditory masking effect performed the best in terms of having less residual noise and better speech quality. Index Terms—Minimum mean-square error (MMSE) estimators, perceptually-motivated speech enhancement, speech distortion measures, speech enhancement. I.
AR-Vector Models For Free-Text Speaker Recognition
- In Proc. Eurospeech-93, pp
, 1992
"... In this paper, a new text-independent speaker recognition method is proposed. This method uses a modeling of the spectral evolution of the speech signals, which is capable of processing some aspects of the inter-speaker variability : the AR-Vector models. Some inter-speaker measures are presented an ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, a new text-independent speaker recognition method is proposed. This method uses a modeling of the spectral evolution of the speech signals, which is capable of processing some aspects of the inter-speaker variability : the AR-Vector models. Some inter-speaker measures are presented and their advantages/inconvenients are discussed. A training technique to learn discriminant AR-Vector models is proposed. The evaluation of this method is carried out on the TIMIT database recorded by cooperative speakers without any impostor. A series of text-independent speaker identification experiments are described. There is no specific corpus for the training sentences and the training corpus is different from the test corpus. Two speech qualities are tested (i.e., good quality and phone quality). The experiments with good speech quality give first-rate results (i.e, identification rate of 100% for 420 speakers) without using more than two sentences for each test. I. INTRODUCTION Speak...
Extracting change-patterns from cvs repositories
- In 13th Working Conference on Reverse Engineering (WCRE 2006
, 2006
"... Often, the only sources of information about the evolution of software systems are the systems themselves and their histories. Version control repositories contain information on several thousand of files and on millions of changes. We propose an approach based on dynamic time warping to discover ch ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Often, the only sources of information about the evolution of software systems are the systems themselves and their histories. Version control repositories contain information on several thousand of files and on millions of changes. We propose an approach based on dynamic time warping to discover change-patterns, which, for example, describe files that change together almost all the time. We define the Synchrony changepattern to answer the question: given a software system and one file under modification, what others files must be changed? We have applied our approach on PADL, a software system developed in Java, and on Mozilla. Interesting results are achieved even when the discovered groups of co-changing files are compared with these provided by experts. 1

