The Use of Clustering Techniques for Language Modeling - Application to Asian Languages
Cached
Download Links
by
Jianfeng Gao
,
Joshua T. Goodman
,
Jiangbo Miao
| Citations: | 15 - 11 self |
BibTeX
@MISC{Gao_theuse,
author = {Jianfeng Gao and Joshua T. Goodman and Jiangbo Miao},
title = {The Use of Clustering Techniques for Language Modeling - Application to Asian Languages},
year = {}
}
Years of Citing Articles
OpenURL
Abstract
Cluster-based n-gram modeling is a variant of normal word-based n-gram modeling. It attempts to make use of the similarities between words. In this paper, we present an empirical study of clustering techniques for Asian language modeling. Clustering is used to improve the performance (i.e. perplexity) of language models as well as to compress language models. Experimental tests are presented for cluster-based trigram models on a Japanese newspaper corpus, and on a Chinese heterogeneous corpus.







