Results 1 -
6 of
6
Incorporating SecondOrder Information Into Two-Step Major Phrase Break Prediction for Korean. ICSLP-06
- Issue
, 2006
"... In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of se ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols 1 and location of juncture in the sentence), global features (chunk label derived from a eojeol parse tree) and second-order features (distance probability of previous and next phrase break). These three features were combined and used in the experiments, and we were able to generate good performance especially in the major phrase break prediction. Index Terms: phrase break, prosodic phrasing, speech synthesis, ToBI
Speech Recognition Error Correction Using Maximum Entropy Language Model
"... A speech interface is often required in many application environments, such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low speech recognition rate makes it difficult to extend its application to new fields. We propose a domain adaptation t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A speech interface is often required in many application environments, such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low speech recognition rate makes it difficult to extend its application to new fields. We propose a domain adaptation technique via error correction with a maximum entropy language model, which is a general and elegant framework to combine higher level linguistic knowledge. Our approach has the ability to correct both semantic and lexical errors in 1-best output from the black-box style speech recognizer, and can improve the performance of speech recognition and application system. Through extensive experiments using a speechdriven in-vehicle telematics information retrieval and spoken language understanding, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented error correction approaches. 1.
Automatic acquisition of Named Entity tagged corpus from World Wide Web,” in proceedings of the 41st annual meeting of the ACL (poster presentation
, 2003
"... In this paper, we present a method that automatically constructs a Named Entity (NE) tagged corpus from the web to be used for learning of Named Entity Recognition systems. We use an NE list and an web search engine to collect web documents which contain the NE instances. The documents are refined t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, we present a method that automatically constructs a Named Entity (NE) tagged corpus from the web to be used for learning of Named Entity Recognition systems. We use an NE list and an web search engine to collect web documents which contain the NE instances. The documents are refined through sentence separation and text refinement procedures and NE instances are finally tagged with the appropriate NE categories. Our experiments demonstrates that the suggested method can acquire enough NE tagged corpus equally useful to the manually tagged one without any human intervention. 1
Morphological annotation of Korean with Directly Maintainable Resources
"... This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The outpu ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89 % recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 word/s. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process. 1.
SEMANTIC-ORIENTED ERROR CORRECTION FOR SPOKEN QUERY PROCESSING
"... Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to inc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach which can correct semantic level errors as well as lexical errors, and is more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the better performance of our approach and some advantages over previous lexicaloriented approaches. 1.
Using Higher-level Linguistic Knowledge for Speech Recognition Error
- In: Proceedings of the HLT-NAACL special workshop on Higher-Level Linguistic Information for Speech Processing
, 2004
"... Speech interface is often required in many application environments such as telephonebased information retrieval, car navigation systems, and user-friendly interfaces, but the low speech recognition rate makes it difficult to extend its application to new fields. Several approaches to increase ..."
Abstract
- Add to MetaCart
Speech interface is often required in many application environments such as telephonebased information retrieval, car navigation systems, and user-friendly interfaces, but the low speech recognition rate makes it difficult to extend its application to new fields. Several approaches to increase the accuracy of the recognition rate have been researched by error correction of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest an improved syllable-based model and a new semantic-oriented approach to correct both semantic and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information retrieval, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

