Results 1 -
3 of
3
Scalable Discriminative Learning for Natural Language Parsing and Translation
- In Proceedings of the 2006 Neural Information Processing Systems (NIPS
, 2006
"... Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative lear ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with treestructured models. Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection over a compound feature space as it learns. Experiments demonstrate the method’s versatility, accuracy, and efficiency. Relevant software is freely available at
Discovering Sociolinguistic Associations with Structured Sparsity
"... We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors ’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors ’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ℓ1, ∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties. 1
Computational Challenges in Parsing by Classification
"... This paper presents a discriminative parser that does not use a generative model in any way, yet whose accuracy still surpasses a generative baseline. The parser performs feature selection incrementally during training, as opposed to a priori, which enables it to work well with minimal linguistic cl ..."
Abstract
- Add to MetaCart
This paper presents a discriminative parser that does not use a generative model in any way, yet whose accuracy still surpasses a generative baseline. The parser performs feature selection incrementally during training, as opposed to a priori, which enables it to work well with minimal linguistic cleverness. The main challenge in building this parser was fitting the training data into memory. We introduce gradient sampling, which increased training speed 100-fold. Our implementation is freely available at

