Results 1 -
3 of
3
Machine Learning Approaches for Dealing with Limited Bilingual Data in Statistical Machine Translation
"... Statistical machine translation (SMT) systems have made great strides in translation quality. However, high quality translation output is dependent on the availability of massive amounts of parallel text in the source and target language. There are a large number of languages that are considered “lo ..."
Abstract
- Add to MetaCart
Statistical machine translation (SMT) systems have made great strides in translation quality. However, high quality translation output is dependent on the availability of massive amounts of parallel text in the source and target language. There are a large number of languages that are considered “low-density”, either because the population speaking the language is not very large, or even if millions of people speak the language, insufficient online resources are available in that language. This tutorial covers machine learning approaches for dealing with such situations in statistical machine translation where the amount of available bilingual data is limited. A statistical translation system can be improved and/or adapted by incorporating new training data in the form of parallel text. The problem of learning from insufficient labeled training data has been dealt with in machine learning community under two general frameworks: (i) Semi-supervised Learning, and (ii) Active Learning. The goal of semi-supervised learning is to take advantage of abundant and cheap unlabeled data, together with labeled data, to build a high quality mapping from examples (the input space) to labels (the output space). On the other hand, the goal of active learning is to reduce the amount of labeled data required to learn a high
Frame Assignment with Active Learning
, 2009
"... Recently natural language understanding is given a special attention, since in natural language processing techniques syntactic analysis such as part-of-speech tagging and parsing had a great progress and semantic analysis did not have such a rapid progress. In information extraction and question-an ..."
Abstract
- Add to MetaCart
Recently natural language understanding is given a special attention, since in natural language processing techniques syntactic analysis such as part-of-speech tagging and parsing had a great progress and semantic analysis did not have such a rapid progress. In information extraction and question-answering systems semantic understanding techniques are required. Frame semantics structure analysis is one of the understanding techniques. In this type of analysis, the semantic roles of elements participated in the action would be identified. To determine the roles automatically, two steps are required: one is frame assignment, and the other one is role assignment. What we aim to do is assigning frames with a supervised machine learning method called ‘active learning’. Supervised learning method requires a huge amount of labeled data. The aim of active learning promises to maximize the performance by minimizing the human’s effort to label the data. To our end, we have selected pool-based active learning with uncertainty sampling method; and also we have chosen 14 frequent targets from FrameNet data set for our task. Random sampling which represents the distribution of frames in the corpus
INTERACTIVE LEARNING PROTOCOLS FOR NATURAL LANGUAGE APPLICATIONS
, 2009
"... Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpus-based statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks such as parsing, machine translation, and information ex ..."
Abstract
- Add to MetaCart
Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpus-based statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks such as parsing, machine translation, and information extraction, amongst others. However, while supervised machine learning is well understood, its successful application to practical scenarios is predicated on obtaining large annotated corpora and performing significant feature engineering, both notably expensive undertakings. Interactive learning protocols offer one promising solution for reducing these costs by allowing the learner and domain expert to interact during learning in an effort to both reduce sample complexity and improve system performance. By specifying a method where the learner may request targeted information, the domain expert is focused on providing the most useful information. This work formalizes a general framework for interactive learning and examines two interactive learning protocols with particular attention to natural language scenarios. We first examine active learning for structured output spaces, the scenario where there are multiple predictions which must be composed into a structurally coherent global prediction. Secondly, we examine active learning for pipeline models, where a complex prediction is decomposed into a sequence of predictions

