## Industry/Government Track Poster Document Preprocessing For Naive Bayes Classification and Clustering with Mixture of Multinomials

### BibTeX

@MISC{Pavlov_industry/governmenttrack,

author = {Dmitry Pavlov and Ramnath Balasubramanyan and Byron Dom and Shyam Kapur and Jignashu Parikh},

title = {Industry/Government Track Poster Document Preprocessing For Naive Bayes Classification and Clustering with Mixture of Multinomials},

year = {}

}

### OpenURL

### Abstract

Naive Bayes classifier has long been used for text categorization tasks. Its sibling from the unsupervised world, the mixture of multinomial models, has likewise been successfully applied to text clustering problems. Despite the strong independence assumptions that these models make, their attractiveness come from low computational cost, relatively low memory consumption, as well as ability to handle heterogeneous features and multiple classes. Recently, there has been several attempts to improve the accuracy of Naive Bayes by performing heuristic feature transformations, such as IDF, normalization by the length of the documents and taking the logarithms of the counts. We justify the use of these techniques and apply them to two problems: classification of products in Yahoo! Shopping and clustering the vectors of collocated terms in user queries to Yahoo! Search. The experimental evaluation allows us to draw conclusions about the promise that these transformations carry with regard to alleviating the strong assumptions of the multinomial model.

### Citations

3921 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...rms. We propose an explanation to this phenomenon and draw conclusions in Section 5. 2. MODEL DESCRIPTION We only present brief model descriptions here, for a more detailed discussion please refer to =-=[3]-=- for Naive Bayes classifier and to [6] for multinomial mixture models. We assume that the data consists of documents, each represented as a bag of words, i.e. as a map from the set of terms occurring ... |

879 |
Mixture Models
- Mclachlan, Basford
- 1988
(Show Context)
Citation Context ... phenomenon and draw conclusions in Section 5. 2. MODEL DESCRIPTION We only present brief model descriptions here, for a more detailed discussion please refer to [3] for Naive Bayes classifier and to =-=[6]-=- for multinomial mixture models. We assume that the data consists of documents, each represented as a bag of words, i.e. as a map from the set of terms occurring to their count in the text. The length... |

758 | A comparison of event models for naive Bayes text classification
- McCallum, Nigam
(Show Context)
Citation Context .... Keywords: Classification, clustering, data transformations, performance, Naive Bayes, mixture of multinomials. 1. INTRODUCTION Naive Bayes classifier has been a subject of multiple research studies =-=[2, 5]-=- and almost on every occasion it was noted that the Naive Bayes’ performance can be jeopardized by several independent problems. These include the imbalance of training documents in classes, mismatch ... |

601 | On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
- Domingos, Pazzani
- 1997
(Show Context)
Citation Context .... Keywords: Classification, clustering, data transformations, performance, Naive Bayes, mixture of multinomials. 1. INTRODUCTION Naive Bayes classifier has been a subject of multiple research studies =-=[2, 5]-=- and almost on every occasion it was noted that the Naive Bayes’ performance can be jeopardized by several independent problems. These include the imbalance of training documents in classes, mismatch ... |

108 | Tackling the poor assumptions of naive Bayes text classifiers
- Rennie, Shih, et al.
- 2003
(Show Context)
Citation Context ...deal-with problems, and from this perspective Naive Bayes classifier looks quite attractive, especially if some of its drawbacks can be alleviated. This was exactly the goal of several recent studies =-=[11, 10]-=-. The first two papers discuss approaches with introducing dependencies between features and performing clustering prior to classification. These methods are quite a bit more complex than the regular ... |

56 | Error correcting output coding for text classification
- Berger
- 1999
(Show Context)
Citation Context ... an efficient way of handling the multiclass problems. Oneagainst-others solution is going to “enjoy” even more severe problems with imbalanced classes than Naive Bayes. Errorcorrecting output coding =-=[1]-=- remains a viable option, but in any case the computational penalty of training a single SVM is going to be multiplied by at least the number of classes, i.e. possibly grow 1 to 2 orders of magnitude ... |

38 |
Using sparseness and analytic QP to speed training of support vector machines
- Platt
- 1999
(Show Context)
Citation Context ...de in our case. But still by far the main problem for SVMs comes with the number of training data points, which are in our case products, constantly growing in number. The most advanced SMO algorithm =-=[9]-=- has at least quadratic complexity in the number of data points per iteration. There has been work on speeding up SVM training includes various forms of data sampling, boosting [7] and squashing [8]. ... |

21 |
A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling
- Csiszár
- 1989
(Show Context)
Citation Context ...entropy classifiers are not going to have as big a problem handling multiple classes as SVMs, but they are probably even slower than SVMs if trained with a commonly used generalized iterative scaling =-=[4]-=-. There has been also work on speeding up the training of maximum entropy models. However, even with these speed ups, the algorithms are still far from the simplicity and speed of naive Bayes. Thus, t... |

18 | Scaling-Up Support Vector Machines Using Boosting Algorithm‖, ICPR
- Pavlov, Mao, et al.
- 2000
(Show Context)
Citation Context ...anced SMO algorithm [9] has at least quadratic complexity in the number of data points per iteration. There has been work on speeding up SVM training includes various forms of data sampling, boosting =-=[7]-=- and squashing [8]. However, in our experience handling multiclass problems is still an issue while the preprocessing steps above can easily result in either inaccurate classifier (e.g., sampling) or ... |

15 | Towards scalable support vector machines using squashing
- Pavlov, Chudova, et al.
- 2000
(Show Context)
Citation Context ...m [9] has at least quadratic complexity in the number of data points per iteration. There has been work on speeding up SVM training includes various forms of data sampling, boosting [7] and squashing =-=[8]-=-. However, in our experience handling multiclass problems is still an issue while the preprocessing steps above can easily result in either inaccurate classifier (e.g., sampling) or are too computatio... |

11 | A decomposition of classes via clustering to explain and improve naive Bayes
- Vilalta, Rish
- 2003
(Show Context)
Citation Context ...deal-with problems, and from this perspective Naive Bayes classifier looks quite attractive, especially if some of its drawbacks can be alleviated. This was exactly the goal of several recent studies =-=[11, 10]-=-. The first two papers discuss approaches with introducing dependencies between features and performing clustering prior to classification. These methods are quite a bit more complex than the regular ... |