Results 1 -
2 of
2
An Approach to Using the Web as a Live Corpus for Spoken Transliteration Name Access
- COMPUTATIONAL LINGUISTICS AND CHINESE LANGUAGE PROCESSING
, 2006
"... Recognizing transliteration names is challenging due to their flexible formulation and lexical coverage. In our approach, we employ the Web as a giant corpus. The patterns extracted from the Web are used as a live dictionary to correct speech recognition errors. The plausible character strings recog ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recognizing transliteration names is challenging due to their flexible formulation and lexical coverage. In our approach, we employ the Web as a giant corpus. The patterns extracted from the Web are used as a live dictionary to correct speech recognition errors. The plausible character strings recognized by an Automated Speech Recognition (ASR) system are regarded as query terms and submitted to Google. The top N snippets are entered into PAT trees. The terms of the highest scores are selected. Our experiments show that the ASR model with a recovery mechanism can achieve 21.54 % performance improvement compared with the ASR only model on the character level. The recall rate is improved from 0.20 to 0.42, and the MRR from 0.07 to 0.31. For collecting transliteration names, we propose a named entity (NE) ontology generation engine, called the XNE-Tree engine, which produces relational named entities by a given seed. The engine incrementally extracts high co-occurring named entities with the seed. A total of 7,642 named entities in the ontology were initiated by 100 seeds. When the bi-character language model is combined with the NE ontology, the ASR recall rate and MRR are improved to 0.48 and 0.38, respectively.
Translating–transliterating named entities for multilingual information access
- Journal of the American Society for Information Science and Technology. Volume
, 2006
"... Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of multilingual-named entities. The rules and similarity matrices for translation and transliteration a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of multilingual-named entities. The rules and similarity matrices for translation and transliteration are learned automatically from parallel-named-entity corpora. The results are applied in cross-language access to collections of images with captions. Experimental results demonstrate that the similarity-based transliteration of named entities is effective, and runs in which transliteration is considered outperform the runs in which it is neglected.

