Results 1 - 10
of
28
Accomplishments and Challenges in Literature Data Mining for Biology
, 2002
"... We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened fro ..."
Abstract
-
Cited by 118 (8 self)
- Add to MetaCart
We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened from recognition of protein interactions to arange of problems such as improving homology search, identifying cellular location, and so on. To encourage participation and accelerate progress in this expanding field, we propose creating challenge evaluations, and we describe two specific applications in this context.
Sentence Boundary Detection in Broadcast Speech Transcripts
- in Proc. of ISCA Workshop: Automatic Speech Recognition: Challenges for the new Millennium ASR-2000
, 2000
"... This paper presents an approach to identifying sentence boundaries in broadcast speech transcripts. We describe finite state models that extract sentence boundary information statistically from text and audio sources. An n-gram language model is constructed from a collection of British English news ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
This paper presents an approach to identifying sentence boundaries in broadcast speech transcripts. We describe finite state models that extract sentence boundary information statistically from text and audio sources. An n-gram language model is constructed from a collection of British English news broadcasts and scripts. An alternative model is estimated from pause duration information in speech recogniser outputs aligned with their programme script counterparts. Experimental results show that the pause duration model alone outperforms the language modelling approach and that, by combining these two models, it can be improved further and precision and recall scores of over 70% were attained for the task. 1. INTRODUCTION Spoken audio data is a rich information source. Extensive research efforts during past decades have resulted in automatic speech transcription systems that can perform certain tasks (e.g., large vocabulary dictation from a cooperative speaker) with a high degree of a...
Punctuation annotation using statistical prosody models
- in Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding
, 2001
"... This paper is about the development of statistical models of prosodic features to generate linguistic meta-data for spoken language. In particular, we are concerned with automatically punctuating the output of a broadcast news speech recogniser. We present a statistical finite state model that combi ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
This paper is about the development of statistical models of prosodic features to generate linguistic meta-data for spoken language. In particular, we are concerned with automatically punctuating the output of a broadcast news speech recogniser. We present a statistical finite state model that combines prosodic, linguistic and punctuation class features. Experimental results are presented using the Hub–4 Broadcast News corpus, and in the light of our results we discuss the issue of a suitable method of evaluating the present task. 1.
Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives
- IEEE Transactions on Speech and Audio Processing
, 2004
"... Abstract—Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, to ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Abstract—Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40 % were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65–84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80 % agreement with
Transcription And Summarization Of Voicemail Speech
- Proc. ICSLP
, 2000
"... This paper describes the development of a system to transcribe and summarize voicemail messages. The results of the research presented in this paper are two-fold. First, a hybrid connectionist approach to the Voicemail transcription task shows that competitive performance can be achieved using a con ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This paper describes the development of a system to transcribe and summarize voicemail messages. The results of the research presented in this paper are two-fold. First, a hybrid connectionist approach to the Voicemail transcription task shows that competitive performance can be achieved using a context-independent system with fewer parameters than those based on mixtures of Gaussian likelihoods. Second, an effective and robust combination of statistical with prior knowledge sources for term weighting is used to extract information from the decoder's output in order to deliver summaries to the message recipients via a GSM Short Message Service (SMS) gateway. 1. INTRODUCTION As the emphasis in cellular networks changes from voice-only communication to a rich combination of content based applications and services, speech recognition can provide access to several types of information through a number of portable solutions, including mobile phones and personal digital assistants. This pa...
Automatic Summarization of Voicemail Messages Using Lexical and Prosodic Features
, 2005
"... This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems, as well as human transcriptions of voicemail speech.
Information Extraction from Broadcast News
- Philosophical Transactions of the Royal Society of London, Series A
, 2000
"... This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular we concentrate on statistical finite state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first models name class information as a word attribute; the second explicitly models both word-word and class-class transitions. A common n-gram based formulation is used for both models. The task of named entity identification is characterized by relatively sparse training data and issues related to smoothing are discussed. Experiments are reported using the DARPA/NIST Hub--4E evaluation for North American Broadcast News.
A Fully Automatic Random Walker Segmentation for Skin Lesions in a Supervised Setting
"... Abstract. We present a method for automatically segmenting skin lesions by initializing the random walker algorithm with seed points whose properties, such as colour and texture, have been learnt via a training set. We leverage the speed and robustness of the random walker algorithm and augment it i ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
Abstract. We present a method for automatically segmenting skin lesions by initializing the random walker algorithm with seed points whose properties, such as colour and texture, have been learnt via a training set. We leverage the speed and robustness of the random walker algorithm and augment it into a fully automatic method by using supervised statistical pattern recognition techniques. We validate our results by comparing the resulting segmentations to the manual segmentations of an expert over 120 cases, including 100 cases which are categorized as difficult (i.e.: low contrast, heavily occluded, etc.). We achieve an F-measure of 0.95 when segmenting easy cases, and an F-measure of 0.85 when segmenting difficult cases. 1
Dynamic knobs for responsive power-aware computing
- In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’11
, 2011
"... We present PowerDial, a system for dynamically adapting application behavior to execute successfully in the face of load and power fluctuations. PowerDial transforms static configuration parameters into dynamic knobs that the PowerDial control system can manipulate to dynamically trade off the accur ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We present PowerDial, a system for dynamically adapting application behavior to execute successfully in the face of load and power fluctuations. PowerDial transforms static configuration parameters into dynamic knobs that the PowerDial control system can manipulate to dynamically trade off the accuracy of the computation in return for reductions in the computational resources that the application requires to produce its results. These reductions translate directly into performance improvements and power savings. Our experimental results show that PowerDial can enable our benchmark applications to execute responsively in the face of power caps that would otherwise significantly impair responsiveness. They also show that PowerDial can significantly reduce the number of machines required to service intermittent load spikes, enabling reductions in power and capital costs.
IE evaluation: Criticisms and recommendations
, 2004
"... We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between results obtained by different resea ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between results obtained by different researchers. Some of them are common to other NLP tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings.

