Results 1 - 10
of
13
Contrastive estimation: Training log-linear models on unlabeled data
- In Proc. of ACL
, 2005
"... Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabele ..."
Abstract
-
Cited by 89 (11 self)
- Add to MetaCart
Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for log-linear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem—POS tagging given a tagging dictionary and unlabeled text—contrastive estimation outperforms EM (with the same feature set), is more robust to degradations of the dictionary, and can largely recover by modeling additional features. 1
Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets
"... Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a diffe ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the outof-domain adaptation scenario. In particular, we achieve 50 % reduction in annotation cost for the in-domain case, yielding an improvement of 66 % over previous work, and a 20-33 % reduction for the domain adaptation case. This is the first time that self-training with small labeled datasets is applied successfully to these tasks. We were also able to formulate a characterization of when selftraining is valuable.
Co-Training for Cross-Lingual Sentiment Classification
"... The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English cor ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine translation services are used for eliminating the language gap between the training set and test set, and English features and Chinese features are considered as two independent views of the classification problem. We propose a cotraining approach to making use of unlabeled Chinese data. Experimental results show the effectiveness of the proposed approach, which can outperform the standard inductive classifiers and the transductive classifiers. 1
Semi-supervised Learning for Automatic Prosodic Event Detection Using Co-training Algorithm
"... Most of previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classification models. However, creating such resources is an expensive and time-co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Most of previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classification models. However, creating such resources is an expensive and time-consuming task. In this paper, we exploit semi-supervised learning with the co-training algorithm for automatic detection of coarse level representation of prosodic events such as pitch accents, intonational phrase boundaries, and break indices. We propose a confidence-based method to assign labels to unlabeled data and demonstrate improved results using this method compared to the widely used agreement-based method. In addition, we examine various informative sample selection methods. In our experiments on the Boston University radio news corpus, using only a small amount of the labeled data as the initial training set, our proposed labeling method combined with most confidence sample selection can effectively use unlabeled data to improve performance and finally reach performance closer to that of the supervised method using all the training data. 1
Part-of-speech tagging of transcribed speech
- In Proceedings of LREC
, 2006
"... We used four Part-of-Speech taggers, which are available for research purposes and were originally trained on text to tag a corpus of transcribed multiparty spoken dialogues. The assigned tags were then manually corrected. The correction was first used to evaluate the four taggers, then to retrain t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We used four Part-of-Speech taggers, which are available for research purposes and were originally trained on text to tag a corpus of transcribed multiparty spoken dialogues. The assigned tags were then manually corrected. The correction was first used to evaluate the four taggers, then to retrain them. Despite limited resources in time, money and annotators we reached results comparable to those reported for the taggers on text. Based on our experience we present guidelines to produce reliably POS tagged corpora of new domains. 1.
Automatic Improvement of Machine Translation Systems
, 2007
"... N66001-99-2-891804. Any opinions, findings, conclusions, or recommendations expressed in this material are those of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
N66001-99-2-891804. Any opinions, findings, conclusions, or recommendations expressed in this material are those of
Bilingual Co-Training for Sentiment Classification of Chinese Product Reviews
"... The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only availabl ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The lack of reliable Chinese sentiment resources limits research progress on Chinese sentiment classification. However, there are many freely available English sentiment resources on the Web. This article focuses on the problem of cross-lingual sentiment classification, which leverages only available English resources for Chinese sentiment classification. We first investigate several basic methods (including lexicon-based methods and corpus-based methods) for cross-lingual sentiment classification by simply leveraging machine translation services to eliminate the language gap, and then propose a bilingual co-training approach to make use of both the English view and the Chinese view based on additional unlabeled Chinese data. Experimental results on two test sets show the effectiveness of the proposed approach, which can outperform basic methods and transductive methods. 1.
ARE TWO HEADS BETTER THAN ONE? EXPERIMENTS WITH ITALIAN PART-OF-SPEECH LABELLING
"... Descriviamo come combinare l’output di due sistemi per il part-of-speech tagging in italiano. Nonostante ci si aspetti che l’utilizzo di più d’una sorgente di informazione apporti beneficio, i nostri risultati mostrano che il miglioramento nella performance è solo marginale rispetto all’utilizzo di ..."
Abstract
- Add to MetaCart
Descriviamo come combinare l’output di due sistemi per il part-of-speech tagging in italiano. Nonostante ci si aspetti che l’utilizzo di più d’una sorgente di informazione apporti beneficio, i nostri risultati mostrano che il miglioramento nella performance è solo marginale rispetto all’utilizzo di un tagger solo. Esperimenti futuri mireranno ad esplorare l’utilizzo di altri tagger e diverse tecniche di combinazione. There seems to be no obvious way to combine the output of a pair of off-the-shelf POS taggers in order to get improvement over single taggers ’ accuracy. We combined two well-known retrainable taggers, C&C and TnT, using memory-based learning and tested the resulting tagger on Italian POS data, with respect to two different tagsets. Only for one tagset we observed a slight increase in performance, but the added value is small, and one could spare the effort and use as well a single tagger.
The Role of Non-Ambiguous Words in Natural Language Disambiguation
- In Proceedings of the Fourth RANLP
, 2003
"... This paper describes an unsupervised approach for natural language disambiguation, applicable to ambiguity problems where classes of equivalence can be defined over the set of words in a lexicon. Lexical knowledge is induced from non-ambiguous words via classes of equivalence, and enables the a ..."
Abstract
- Add to MetaCart
This paper describes an unsupervised approach for natural language disambiguation, applicable to ambiguity problems where classes of equivalence can be defined over the set of words in a lexicon. Lexical knowledge is induced from non-ambiguous words via classes of equivalence, and enables the automatic generation of annotated corpora. The only requirements are a lexicon and a raw textual corpus. The method was tested on two natural language ambiguity tasks in several languages: part of speech tagging (English, Swedish, Chinese), and word sense disambiguation (English, Romanian). Classifiers trained on automatically constructed corpora were found to have a performance comparable with classifiers that learn from expensive manually annotated data.
Co-training and Self-training for Word Sense Disambiguation
- In CoNLL-2004
, 2004
"... This paper investigates the application of cotraining and self-training to word sense disambiguation. ..."
Abstract
- Add to MetaCart
This paper investigates the application of cotraining and self-training to word sense disambiguation.

