## A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values (1994)

### Cached

### Download Links

- [acl.ldc.upenn.edu]
- [aclweb.org]
- [wing.comp.nus.edu.sg]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of 4th Conference on Applied Natural Language Processing |

Citations: | 19 - 6 self |

### BibTeX

@INPROCEEDINGS{Iwayama94aprobabilistic,

author = {Makoto Iwayama},

title = {A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values},

booktitle = {In Proceedings of 4th Conference on Applied Natural Language Processing},

year = {1994},

pages = {162--167}

}

### Years of Citing Articles

### OpenURL

### Abstract

Text categorization is the classification of documents with respect to a set of predefined categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target documents, and 3) is less affected by having insufficient training cases. We verify our model's superiority over the others in the task of categorizing news articles from the "Wall Street Journal".

### Citations

3124 |
Introduction to Modern Information Retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ... = 0l~ ) We refer to Robertson and Sparck Jones' formulation as Probabilistic Relevance Weighting (PRW). While PRW is the first attempt to formalize wellknown relevance weighting (Sparck Jones, 1972; =-=Salton and McGill, 1983-=-) by probability theory, there are several drawbacks in PRW. [Problem 1] no within-document term frequencies PRW does not make use of within-document term frequencies. P(T = 1, 01c) in Eq. (5) takes i... |

597 | K.S.: Relevance weighting of search terms - Robertson, Jones - 1976 |

356 | A practical part-of-speech tagger
- Cutting, Kupiec, et al.
- 1992
(Show Context)
Citation Context ... articles; the smallest one is "RUBBER (RUB)", assigned to only 2 articles. On the average, one category is assigned to 443 articles. All 8,907 articles were tagged by the Xerox Part-ofSpeec=-=h Tagger (Cutting et al., 1992) 4. From -=-the tagged articles, we extracted the root words of nouns using the "ispell" program 5. As a result, each article has a set of root words representing it, and each element in the set (i.e. r... |

356 | A statistical interpretation of term specificity and its application in retrieval
- Jones
- 1972
(Show Context)
Citation Context ...eZe-'Ce_d P(~ = 0l~ ) We refer to Robertson and Sparck Jones' formulation as Probabilistic Relevance Weighting (PRW). While PRW is the first attempt to formalize wellknown relevance weighting (Sparck =-=Jones, 1972-=-; Salton and McGill, 1983) by probability theory, there are several drawbacks in PRW. [Problem 1] no within-document term frequencies PRW does not make use of within-document term frequencies. P(T = 1... |

338 | Self-organized Language Modeling for Speech Recognition
- Jelinek
- 1990
(Show Context)
Citation Context ...s, 1976). A well-known remedy for this problem is to use "(r + 0.5)/(R + 1)" as the estimate of P(T = lie ) (Robertson and Sparck Jones, 1976). While various smoothing methods (Church and Ga=-=le, 1991; Jelinek, 1990) are also-=- applicable to these situations and would be expected to work better, we used the simple "add one" remedy in the following experiments. 2.2 Component Theory (CT) To solve problems 1 and 2 of... |

255 | Automated learning of decision rules for text categorization - Apté, Damerau, et al. - 1994 |

234 |
The Probability Ranking Principle in IR
- Robertson
- 1977
(Show Context)
Citation Context ...asing order according to their probabilities. The larger P(cldi) a document di has, the more probably it will be categorized into category c. This is called the Probabilistic Ranking Principle (PRP) (=-=Robertson, 1977-=-). Several strategies can be used to assign categories to a document based on PRP (Lewis, 1992). There are several ways to calculate P(c[d). Three representatives are (Robertson and Sparck Jones, 1976... |

208 |
An evaluation of phrasal and clustered representations on a text categorization task
- Lewis
- 1992
(Show Context)
Citation Context ..., tit ech. ac. jp While many text categorization models have been proposed so far, in this paper, we concentrate on the probabilistic models (Robertson and Sparck Jones, 1976; Kwok, 1990; Fuhr, 1989; =-=Lewis, 1992-=-; Croft, 1981; Wong and Yao, 1989; Yu et al., 1989) because these models have solid formal grounding in probability theory. Section 2 quickly reviews the probabilistic models and lists their individua... |

128 |
A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language
- Church, Gale
- 1991
(Show Context)
Citation Context ...bertson and Sparck Jones, 1976). A well-known remedy for this problem is to use "(r + 0.5)/(R + 1)" as the estimate of P(T = lie ) (Robertson and Sparck Jones, 1976). While various smoothing=-= methods (Church and Gale, 1991; Jelinek,-=- 1990) are also applicable to these situations and would be expected to work better, we used the simple "add one" remedy in the following experiments. 2.2 Component Theory (CT) To solve prob... |

93 |
On the specification of term values in automatic indexing
- Salton, Yang
- 1973
(Show Context)
Citation Context ...a document play an important role in information retrieval (Salton and McGill, 1983). Salton and Yang experimentally verified the importance of within-document term frequencies in their vector model (=-=Salton and Yang, 1973-=-). (5) [Problem 2] no term weighting for target documents In the PRW formulation, there is no factor of term weighting for target documents (i.e., P(.Id)). According to Eq. (5), even if a term exists ... |

87 | Models for Retrieval with Probabilistic Indexing," Information Processing and Management
- Fuhr
- 1989
(Show Context)
Citation Context ...PAN t ake@cs, tit ech. ac. jp While many text categorization models have been proposed so far, in this paper, we concentrate on the probabilistic models (Robertson and Sparck Jones, 1976; Kwok, 1990; =-=Fuhr, 1989-=-; Lewis, 1992; Croft, 1981; Wong and Yao, 1989; Yu et al., 1989) because these models have solid formal grounding in probability theory. Section 2 quickly reviews the probabilistic models and lists th... |

24 |
Experiments with component theory of probabilistic information retrieval based on single terms as document components
- KWOK
- 1990
(Show Context)
Citation Context ...OKYO 152, JAPAN t ake@cs, tit ech. ac. jp While many text categorization models have been proposed so far, in this paper, we concentrate on the probabilistic models (Robertson and Sparck Jones, 1976; =-=Kwok, 1990-=-; Fuhr, 1989; Lewis, 1992; Croft, 1981; Wong and Yao, 1989; Yu et al., 1989) because these models have solid formal grounding in probability theory. Section 2 quickly reviews the probabilistic models ... |

23 |
A probability distribution model for information retrieval
- Wong, Yao
- 1989
(Show Context)
Citation Context ...ny text categorization models have been proposed so far, in this paper, we concentrate on the probabilistic models (Robertson and Sparck Jones, 1976; Kwok, 1990; Fuhr, 1989; Lewis, 1992; Croft, 1981; =-=Wong and Yao, 1989-=-; Yu et al., 1989) because these models have solid formal grounding in probability theory. Section 2 quickly reviews the probabilistic models and lists their individual problems. In section 3, we prop... |

22 |
Towards automatic indexing: automatic assignment of controlled-language indexing and classification from free indexing
- FIELD
- 1975
(Show Context)
Citation Context ...robabilistic models are compared in this calculation. There are several strategies for assigning categories to a document based on the probability P(cld ). The simplest one is the k-per-doc strategy (=-=Field, 1975-=-) that assigns the top k categories to each document. A more sophisticated one is the probability threshold strategy, in which all the categories above a user-defined threshold are assigned to a docum... |

18 |
Document representation in probabilistic models of information retrieval
- Croft
- 1981
(Show Context)
Citation Context .... jp While many text categorization models have been proposed so far, in this paper, we concentrate on the probabilistic models (Robertson and Sparck Jones, 1976; Kwok, 1990; Fuhr, 1989; Lewis, 1992; =-=Croft, 1981-=-; Wong and Yao, 1989; Yu et al., 1989) because these models have solid formal grounding in probability theory. Section 2 quickly reviews the probabilistic models and lists their individual problems. I... |

7 |
Collection properties influencing automatic term classification
- JONES, K
(Show Context)
Citation Context ...ple document representation in which a document is defined as a set of nouns, there could be considered several improvements, such as using phrasal information (Lewis, 1992), clustering terms (Sparck =-=Jones, 1973),-=- reducing the number of features by using local dictionary (Apt4 et al., 1994), etc. • We are incorporating our probabilistic model into cluster-based text categorization that offers an efficient an... |