Results 11 - 20
of
32
Survey of Current Speech Technology
- Communications of the ACM
, 1994
"... This article describes two technologies, speech recognition and speech synthesis, that manipulate speech in terms of its information content. Recognition is the transformation of human speech into text to be used literally (e.g., for dictation) or interpreted as commands to control applications. Syn ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This article describes two technologies, speech recognition and speech synthesis, that manipulate speech in terms of its information content. Recognition is the transformation of human speech into text to be used literally (e.g., for dictation) or interpreted as commands to control applications. Synthesis allows the generation of spoken utterances from text. Synthesis is desirable when a large number of utterances must be available or when message content is unpredictable, requirements that make pre-recording of speech impractical. The technologies covered in this article are of particular interest because they support direct communication between humans and computers through a mode that humans commonly use for communication amongst themselves and at which they are highly skilled. Other speech technologies of note, not discussed here, include speaker recognition (automatically establishing a speaker's identity) as well as speech editing and indexing (the manipulation of speech without the extraction of linguistic information). 1 RECOGNITION
Category-Based Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on word-category n-grams. The second section, in chapter 4, extends this model by allowing short-range word relations to be captured through the incorporation of selected word n-grams.
How to Wreck a Nice Beach You Sing Calm Incense
- Proceedings of the 10th international conference on Intelligent user interfaces
, 2005
"... A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the "best " candidate. If the choice is in ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the "best " candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models. We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that "don't make sense " from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.
User-centered Modeling for Spoken Language and Multimodal Interfaces
"... By modeling difficult sources of linguistic variability in spontaneous speech and language, interfaces can be designed that transparently guide human input to match system processing capabilities. Such work is yielding more user-centered and robust interfaces for next-generation spoken language and ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
By modeling difficult sources of linguistic variability in spontaneous speech and language, interfaces can be designed that transparently guide human input to match system processing capabilities. Such work is yielding more user-centered and robust interfaces for next-generation spoken language and multimodal systems. Historically, the development of spoken language systems has been primarily a technology-driven phenomenon. However, successful processing of spontaneous speech and dialogue, especially in actual field settings, requires a considerably broader understanding of performance issues during humancomputer spoken interactions. Research from this perspective currently represents a gap in our scientific knowledge, which is widely recognized as having generated a bottleneck in our ability to support robust speech for real commercial applications. The present article summarizes recent research on usercentered modeling of human language and performance during spoken and multimodal interaction, as well as interface design aimed at next-generation systems.
Statistical Syntactic Methods for High Performance OCR
- IEE Proceedings on Vision, Image and Signal Processing
, 1996
"... This paper describes a new method for language modelling and reports its application to handwritten OCR. Images of characters are first chain-coded to convert them to strings. A novel language modelling method is then applied to build a statistical model for strings of each class. The language model ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
This paper describes a new method for language modelling and reports its application to handwritten OCR. Images of characters are first chain-coded to convert them to strings. A novel language modelling method is then applied to build a statistical model for strings of each class. The language modelling method is based on a probabilistic version of an n-tuple classifier which is scanned along the entire string for both training and recognition. This method is extremely fast and robust, and concentrates all the computational effort on the portion of the image where the information is i.e. the edges left by the trace of the pen. Results on the CEDAR handwritten digit database show the new method to be almost as accurate as the best methods reported so far while offering a significant speed advantage. 1 INTRODUCTION There is currently much interest in high-performance OCR, for off-line applications such as the processing of handwritten forms, and on-line applications such as user-inter...
Natural Language Grammatical Inference
, 1995
"... This project is concerned with programming a computer to make predictions about which words are most likely to follow a small segment of English text. At first this may seem a strange problem, but I intend to show that there exist a wide range of applications that would benefit from such a program. ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This project is concerned with programming a computer to make predictions about which words are most likely to follow a small segment of English text. At first this may seem a strange problem, but I intend to show that there exist a wide range of applications that would benefit from such a program. Indeed, my motivation for approaching this problem was to provide a way of improving the accuracy of speech recognition systems. Additionally, I am interested with the problem of Grammatical Inference. In fact, the word prediction problem and the Grammatical Inference problem are intertwined, and it seems that approaching either one will lead to the other. Grammatical Inference entails inferring a grammar for an arbitrary language from a finite set of sample sentences in the language. It is quite easy to measure the performance of a word prediction system, providing that its prediction is given as a probability distribution. This allows us to compare our predictor with others, such as the tr...
Two Questions about Data-Oriented Parsing
- IN PROCEEDINGS FOURTH WORKSHOP ON VERY LARGE CORPORA
, 1996
"... In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that contain unknown words? This paper addresses these questions. We show that parse results on unedited data are worse than on cleaned-up data, although still very competitive if compared to other models. As to the parsing of word strings, we show that the hardness of the problem does not so much depend on unknown words, but on previously unseen lexical categories of known words. We give a novel method for parsing these words by estimating the probabilities of unknown subtrees. The method is of general interest since it shows that good performance can be obtained without the use of a part-of- speech tagger. To the best of our knowledge, our method outperforms other statistical parsers tested on Penn Treebank word strings.
A Corrective Training Algorithm for Adaptive Learning in Bag Generation
- In International Conference on New Methods in Language Processing (NeMLaP
, 1994
"... The sampling problem in training corpus is one of the major sources of errors in corpus-based applications. This paper proposes a corrective training algorithm to best-fit the run-time context domain in the application of bag generation. It shows which objects to be adjusted and how to adjust their ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The sampling problem in training corpus is one of the major sources of errors in corpus-based applications. This paper proposes a corrective training algorithm to best-fit the run-time context domain in the application of bag generation. It shows which objects to be adjusted and how to adjust their probabilities. The resulting techniques are greatly simplified and the experimental results demonstrate the promising effects of the training algorithm from generic domain to specific domain. In general, these techniques can be easily extended to various language models and corpus-based applications.
Statistical Language Processing based on Self-Organising Word Classification
, 1994
"... An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting class ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting classifications are hierarchical, allowing variable class granularity. Words are represented as structural tags --- unique n-bit numbers the most significant bit-patterns of which incorporate class information. Therefore, access to a structural tag immediately provides access to all classification levels for the corresponding word. The classification system has successfully revealed some of the structure of two natural languages, from the phonemic to the semantic level. The system has been favourably compared --- directly and indirectly --- with other word classification systems. Class based interpolated language models have been constructed to exploit the extra information supplied by structural...
A Robust Loose Coupling for Speech Recognition and Natural Language Understanding
- IEEE, Bob O'Hara and Al
, 1995
"... The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer ach ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer achieves slightly worse than 70% word accuracy on (nearly) spontaneous speech in a conversation about a specific problem. To address this problem, I will explore novel methods for post-processing the output of a speech recognizer in order to correct errors. I adopt statistical techniques for modeling the noisy channel from the speaker to the listener in order to correct some of the errors introduced there. The statistical model accounts for frequent errors such as simple word/word confusions and short phrasal problems (one-to-many word substitutionsand many-to-one word concatenations). To use the model, a search algorithm is required to find the most likely correction of a given word sequence ...

