Results 1 -
5 of
5
Augmenting Naive Bayes Classifiers with Statistical Language Models
, 2003
"... We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
We augment naive Bayes models with statistical n-gram language models to address shortcomings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier
NTCIR-3 Chinese, Cross Language Retrieval Experiments Using PIRCS
- PIRCS, Proceedings of NTCIR workshop meeting
, 2001
"... We participated in the monolingual Chinese, English-Chinese cross language and multilingual retrieval tasks using our PIRCS retrieval system. For monolingual, bigram and short-word indexing (both with single characters) were employed for representation. Two separate retrieval lists were obtained and ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We participated in the monolingual Chinese, English-Chinese cross language and multilingual retrieval tasks using our PIRCS retrieval system. For monolingual, bigram and short-word indexing (both with single characters) were employed for representation. Two separate retrieval lists were obtained and later combined as final result for some submissions. For cross-lingual and multilingual retrieval, only short-word indexing was used. We performed retrieval with two types of queries: queries from all sections of a topic, and from the description section only. The best monolingual mean average precision based on relax assessment is ~0.41 for long queries and ~0.36 for short description-only queries. These values are much less than those for NTCIR-2 and may indicate that NTCIR-3 environment is more difficult. For cross-lingual, we employed the query translation approach and concatenated outputs from MT-software and dictionary translation into one Chinese query. Results were also much inferior to those observed in NTCIR-2, achieving only about 56% of monolingual for long and 44% for short queries using relaxed judgment.
English-Chinese Cross-Language Retrieval based on a Translation Package
- In Workshop of Machine Translation for Cross Language Information Retrieval, Machine Translation Summit VII
, 1999
"... An inexpensive COTS translation package, augmented with a downloadable bilingual dictionary, was employed for a study of English-Chinese cross-language information retrieval (CLIR) using the query translation approach. The experimental setting involved the 170 MB Chinese collections and 54 que ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
An inexpensive COTS translation package, augmented with a downloadable bilingual dictionary, was employed for a study of English-Chinese cross-language information retrieval (CLIR) using the query translation approach. The experimental setting involved the 170 MB Chinese collections and 54 queries of TREC and their relevance judgment, and our PIRCS bi-lingual retrieval system. With some standard retrieval techniques such as pretranslation query expansion and combination of retrieval lists, we were able to achieve over 70% of monolingual results for both long and short queries. Insufficient context of short queries appears not a problem for machine translation for English-Chinese CLIR. 1 Introduction CLIR has gained importance in recent years (Oard & Dorr 1996, Grefenstette 1998, Schauble & Sheridan 1998) because accessing foreign web sites and text searching have become popular and convenient. Many language pairs need to be considered, but one can fairly say that autom...
English-Chinese Cross-Lingual Retrieval Using a Translation Package
"... Using a COTS English-Chinese bidirectional translation software package together with our PIRCS bilingual retrieval system, we performed English-Chinese cross-lingual retrieval experiments using the TREC Chinese collections and queries. With some simple approaches, we are able to attain effectivenes ..."
Abstract
- Add to MetaCart
Using a COTS English-Chinese bidirectional translation software package together with our PIRCS bilingual retrieval system, we performed English-Chinese cross-lingual retrieval experiments using the TREC Chinese collections and queries. With some simple approaches, we are able to attain effectiveness about 67% of the monolingual Chinese results. 1. Introduction CLIR has gained importance in recent years [OaDo96,Gref98] because web browsing, accessing foreign sites, and text searching has become popular, easy and convenient. Many language pairs need to be considered, but one can fairly say that English-Chinese cross language IR would become increasingly important because of the growing significance of China in business, politics, science & technology, etc. as well as the sheer number of the Chinese population. English of course is practically the de facto world language. Thus, the ability to do effective retrieval of collections in Chinese (the target language) via queries in English ...
Applying Machine Learning to Text Segmentation for Information Retrieval
, 2002
"... We propose a self-supervised word segmentation technique for text segmentation in Chinese information retrieval. This method combines the advantages of traditional dictionary based, character based and mutual information based approaches, while overcoming many of their shortcomings. Experiments o ..."
Abstract
- Add to MetaCart
We propose a self-supervised word segmentation technique for text segmentation in Chinese information retrieval. This method combines the advantages of traditional dictionary based, character based and mutual information based approaches, while overcoming many of their shortcomings. Experiments on TREC data show this method is promising. Our method is completely language independent and unsupervised, which provides a promising avenue for constructing accurate multi-lingual or cross-lingual information retrieval systems that are exible and adaptive. We nd that although the segmentation accuracy of self-supervised segmentation is not as high as some other segmentation methods, it is enough to give comparable (in some cases even better) retrieval performance. It is commonly believed that word segmentation accuracy is monotonically related to retrieval performance in Chinese information retrieval. However, for Chinese, we nd that the relationship between segmentation and retrieval performance is in fact nonmonotonic; that is, at around 70% word segmentation accuracy an over-segmentation phenomenon begins to occur which leads to a reduction in information retrieval performance. We demonstrate this eect by presenting an empirical investigation of information retrieval on Chinese TREC data, using a wide variety of word segmentation algorithms with word segmentation accuracies ranging from 44% to 95%, including 70% word segmentation accuracy from our self-supervised word-segmentation approach.

