Results 1 -
2 of
2
A Novel Two-Step Method for Cross Language Representation Learning
"... Cross language text classification is an important learning task in natural language processing. A critical challenge of cross language learning arises from the fact that words of different languages are in disjoint feature spaces. In this paper, we pro-pose a two-step representation learning method ..."
Abstract
- Add to MetaCart
(Show Context)
Cross language text classification is an important learning task in natural language processing. A critical challenge of cross language learning arises from the fact that words of different languages are in disjoint feature spaces. In this paper, we pro-pose a two-step representation learning method to bridge the feature spaces of dif-ferent languages by exploiting a set of parallel bilingual documents. Specifically, we first formulate a matrix completion problem to produce a complete parallel document-term matrix for all documents in two languages, and then induce a low dimensional cross-lingual document representation by applying latent semantic indexing on the obtained matrix. We use a projected gradient descent algorithm to solve the formulated matrix completion problem with convergence guarantees. The proposed method is evaluated by conducting a set of experiments with cross language sentiment classification tasks on Amazon product reviews. The experi-mental results demonstrate that the proposed learning method outperforms a num-ber of other cross language representation learning methods, especially when the number of parallel bilingual documents is small. 1
Semi-Supervised Matrix Completion for Cross-Lingual Text Classification
"... Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source lan-guage domain. Cross-lingual text classification is popu-larly studied in natural la ..."
Abstract
- Add to MetaCart
Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source lan-guage domain. Cross-lingual text classification is popu-larly studied in natural language processing area to re-duce the expensive manual annotation effort required in the target language domain. In this work, we propose a novel semi-supervised representation learning approach to address this challenging task by inducing interlin-gual features via semi-supervised matrix completion. To evaluate the proposed learning technique, we conduct extensive experiments on eighteen cross language sen-timent classification tasks with four different languages. The empirical results demonstrate the efficacy of the proposed approach, and show it outperforms a number of related cross-lingual learning methods.