Results 1 -
3 of
3
Linguistic Structured Sparsity in Text Categorization
"... We introduce three linguistically moti-vated structured regularizers based on parse trees, topics, and hierarchical word clusters for text categorization. These regularizers impose linguistic bias in fea-ture weights, enabling us to incorporate prior knowledge into conventional bag-of-words models. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We introduce three linguistically moti-vated structured regularizers based on parse trees, topics, and hierarchical word clusters for text categorization. These regularizers impose linguistic bias in fea-ture weights, enabling us to incorporate prior knowledge into conventional bag-of-words models. We show that our structured regularizers consistently im-prove classification accuracies compared to standard regularizers that penalize fea-tures in isolation (such as lasso, ridge, and elastic net regularizers) on a range of datasets for various text prediction prob-lems: topic classification, sentiment anal-ysis, and forecasting. 1
Locally Non-Linear Learning for Statistical Machine Translation via Discretization and Structured Regularization
"... Linear models, which support efficient learn-ing and inference, are the workhorses of statis-tical machine translation; however, linear de-cision rules are less attractive from a modeling perspective. In this work, we introduce a tech-nique for learning arbitrary, rule-local, non-linear feature tran ..."
Abstract
- Add to MetaCart
Linear models, which support efficient learn-ing and inference, are the workhorses of statis-tical machine translation; however, linear de-cision rules are less attractive from a modeling perspective. In this work, we introduce a tech-nique for learning arbitrary, rule-local, non-linear feature transforms that improve model expressivity, but do not sacrifice the efficient inference and learning associated with linear models. To demonstrate the value of our tech-nique, we discard the customary log transform of lexical probabilities and drop the phrasal translation probability in favor of raw counts. We observe that our algorithm learns a vari-ation of a log transform that leads to better translation quality compared to the explicit log transform. We conclude that non-linear re-sponses play an important role in SMT, an ob-servation that we hope will inform the efforts of feature engineers. 1