Results 1 - 10
of
11
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, Input-Output HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variable-length Markov models and Markov switching state-space models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Survey of the State of the Art in Human Language Technology
, 1995
"... Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Sig ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Signal Representation : : : : : : : : : : : : : : : : : : : : : : : : : : 11 Melvyn J. Hunt 1.4 Robust Speech Recognition : : : : : : : : : : : : : : : : : : : : : : 17 Richard M. Stern 1.5 HMM Methods in Speech Recognition : : : : : : : : : : : : : : : 24 Renato De Mori & Fabio Brugnara 1.6 Language Representation : : : : : : : : : : : : : : : : : : : : : : : : 35 Salim Roukos 1.7 Speaker Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : :<F35.37
Recognizing workshop activity using body worn microphones and accelerometers
- In Pervasive Computing
, 2004
"... Abstract. The paper presents a technique to automatically track the progress of maintenance or assembly tasks using body worn sensors. The technique is based on a novel way of combining data from accelerometers with simple frequency matching sound classification. This includes the intensity analysis ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
Abstract. The paper presents a technique to automatically track the progress of maintenance or assembly tasks using body worn sensors. The technique is based on a novel way of combining data from accelerometers with simple frequency matching sound classification. This includes the intensity analysis of signals from microphones at different body locations to correlate environmental sounds with user activity. To evaluate our method we apply it to activities in a wood shop. On a simulated assembly task our system can successfully segment and identify most shop activities in a continuous data stream with zero false positives and 84.4 % accuracy. 1
On-Line Cursive Handwriting Recognition Using Speech Recognition Methods
, 1994
"... A hidden Markov model (HMM) based continuous speech recognition system is applied to on-line cursive handwriting recognition. The base system is unmodified except for using handwriting feature vectors instead of speech. Due to inherent properties of HMMs, segmentation of the handwritten script sente ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
A hidden Markov model (HMM) based continuous speech recognition system is applied to on-line cursive handwriting recognition. The base system is unmodified except for using handwriting feature vectors instead of speech. Due to inherent properties of HMMs, segmentation of the handwritten script sentences is unnecessary. A 1.1% word error rate is achieved for a 3050 word lexicon, 52 character, writer-dependent task and 3%-5% word error rates are obtained for six different writers in a 25,595 word lexicon, 86 character, writer-dependent task. Similarities and differences between the continuous speech and on-line cursive handwriting recognition tasks are explored; the handwriting database collected over the past year is described; and specific implementation details of the handwriting system are discussed. 1. INTRODUCTION Traditionally, the first step in handwriting recognition is the segmentation of words into component characters [1]. However, in modern continuous speech recognition ef...
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatio--temporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
Augmenting Conversations Using Dual-Purpose Speech
- In Proceedings of UIST 2004
, 2004
"... In this paper, we explore the concept of dual--purpose speech: speech that is socially appropriate in the context of a human-- to--human conversation which also provides meaningful input to a computer. We motivate the use of dual--purpose speech and explore issues of privacy and technological challe ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In this paper, we explore the concept of dual--purpose speech: speech that is socially appropriate in the context of a human-- to--human conversation which also provides meaningful input to a computer. We motivate the use of dual--purpose speech and explore issues of privacy and technological challenges related to mobile speech recognition. We present three applications that utilize dual--purpose speech to assist a user in conversational tasks: the Calendar Navigator Agent, DialogTabs, and Speech Courier. The Calendar Navigator Agent navigates a user's calendar based on socially appropriate speech used while scheduling appointments. DialogTabs allows a user to postpone cognitive processing of conversational material by proving short--term capture of transient information. Finally, Speech Courier allows asynchronous delivery of relevant conversational information to a third party. Additional Keywords and Phrases: Speech user interfaces, dual--purpose speech, mobile computing 1
Optimal Tying of HMMMixture Densities using Decision Trees
, 1996
"... Decision trees havebeen used in speech recognition with large numbers of context-dependentHMM models, to provide models for contexts not seen in training. Trees are usually created by successive node splitting decisions, based on how well a single Gaussian or Poisson density fits the data associated ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Decision trees havebeen used in speech recognition with large numbers of context-dependentHMM models, to provide models for contexts not seen in training. Trees are usually created by successive node splitting decisions, based on how well a single Gaussian or Poisson density fits the data associated with a node. We introduce a new node splitting criterion, derived from the maximum likelihood fitting of the complex node distributions with Gaussian tiedmixture densities. We also carry the use of decision trees for tying HMM models a step further. In addition to questions about phonetic class of neighbouring phonemes,we allow questions about the HMM model state to be asked. The resulting decision tree maximizes the likelihood by adjusting the amount of parameter tying simultaneously across state and context. Accuracy improvement and model size reduction were evaluated on a gender-dependent 5K closed-vocabulary WSJ task, using the SI-84 and SI-284 training sets, for tied-mixture and continuous HMMmodels. The new decision trees are shown to reduce both error rate and model size, while being computationally cheap enough to allow consideration of two preceding and two following phones for the context.
AUTOMATIC SPEECH RECOGNITION AND INTRINSIC SPEECH VARIATION
"... This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect ..."
Abstract
- Add to MetaCart
This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect the different levels of the ASR processing chain. For different sources of speech variation, the paper summarizes the current knowledge and highlights specific feature extraction or modeling weaknesses and current trends. 1.
Yonghong Yan Xintian Wu Johan Schalkwyk Ron Cole
"... This paper presents the CSLU Broadcast News transcription system used in the DARPA 1997 evaluation. The system was built using the softwares developed for the CSLU LVCSR project started in January 1997. This 25K-word vocabulary system used continuous HMMs for acoustic modeling and the standard backo ..."
Abstract
- Add to MetaCart
This paper presents the CSLU Broadcast News transcription system used in the DARPA 1997 evaluation. The system was built using the softwares developed for the CSLU LVCSR project started in January 1997. This 25K-word vocabulary system used continuous HMMs for acoustic modeling and the standard backoff trigram as the language model. The search used a single pass decoder with MLLR based adaptation technique. Although on the standard DARPA 20k WSJ task our system obtained 11.6% word error, the 39% error on this year's evaluation suggests there are still many aspects need to be learned for a new comer like us. 1. Introduction This paper presents the CSLU Broadcast News transcription system used in the DARPA 1997 evaluation. The system was built using software developed for the CSLU LVCSR project, initiated in January 1997. The project proceeded through development and evaluation of systems associated with previous DARPA tasks; specifically the RM system, and the WSJ-5k and WSJ-20k system...

