## Noise Reduction in a Statistical Approach to Text Categorization (1995)

### Abstract

This paper studies noise reduction for computational efficiency improvements in a statistical learning method for text categorization, the Linear Least Squares Fit (LLSF) mapping. Multiple noise reduction strategies are proposedand evaluated, including: an aggressive removal of “non-informative words ” from texts before training; the use of a truncated singular value decomposition to cut off noisy “latent semantic structures ” during training; the elimination of non-influential components in the LLSF solution (a word-concept association matrix) after training. Text collections in different domains were used for evaluation. Significant improvements in computational efficiency without losing categorization accuracy were evident in the testing results. 1

