Results 11 -
15 of
15
Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis
"... We present a pointwise approach to Japanese morphological analysis (MA) that ignores structure information during learning and tagging. Despite the lack of structure, it is able to outperform the current state-of-the-art structured approach for Japanese MA, and achieves accuracy similar to that of s ..."
Abstract
- Add to MetaCart
We present a pointwise approach to Japanese morphological analysis (MA) that ignores structure information during learning and tagging. Despite the lack of structure, it is able to outperform the current state-of-the-art structured approach for Japanese MA, and achieves accuracy similar to that of structured predictors using the same feature set. We also find that the method is both robust to outof-domain data, and can be easily adapted through the use of a combination of partial annotation and active learning. 1
Research Statement
"... I am motivated by the prospect of computers that learn, by interacting and collaborating with humans, how to solve problems. Such systems might take several forms. For example, imagine you are an entrepreneur and you want to train a computer to help you analyze what people are saying about your prod ..."
Abstract
- Add to MetaCart
I am motivated by the prospect of computers that learn, by interacting and collaborating with humans, how to solve problems. Such systems might take several forms. For example, imagine you are an entrepreneur and you want to train a computer to help you analyze what people are saying about your products. You have domain knowledge about your business and the decisions you want the system to make, such as identifying positive vs. negative product reviews across the Internet. You might want to initialize the system with your background knowledge (e.g., the words “wonderful” and “terrible ” indicate high and low customer satisfaction, respectively), inspect substantial amounts of relevant text data, and then allow it ask questions to help refine its understanding of your goals (e.g., is “predictable ” a positive word for your product? — which may depend on whether you make kitchen appliances or write novels). Alternatively, imagine you are a biologist with a highthroughput laboratory technique to test hundreds of proteins in tandem. You would like it to analyze hundreds (even thousands) of these measurements, induce hypotheses that might explain the data and communicate them to you (which you might want to edit based on your knowledge or intuition), and let it propose subsequent experiments in order to refine these hypotheses, or potentially discover other proteins with the properties you study.
INTERACTIVE LEARNING PROTOCOLS FOR NATURAL LANGUAGE APPLICATIONS
, 2009
"... Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpus-based statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks such as parsing, machine translation, and information ex ..."
Abstract
- Add to MetaCart
Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpus-based statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks such as parsing, machine translation, and information extraction, amongst others. However, while supervised machine learning is well understood, its successful application to practical scenarios is predicated on obtaining large annotated corpora and performing significant feature engineering, both notably expensive undertakings. Interactive learning protocols offer one promising solution for reducing these costs by allowing the learner and domain expert to interact during learning in an effort to both reduce sample complexity and improve system performance. By specifying a method where the learner may request targeted information, the domain expert is focused on providing the most useful information. This work formalizes a general framework for interactive learning and examines two interactive learning protocols with particular attention to natural language scenarios. We first examine active learning for structured output spaces, the scenario where there are multiple predictions which must be composed into a structurally coherent global prediction. Secondly, we examine active learning for pipeline models, where a complex prediction is decomposed into a sequence of predictions

