Results 1 -
5 of
5
Posterior Sparsity in Unsupervised Dependency Parsing
, 2010
"... A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed accuracy over the standard expectation maximization (EM) baseline for 9 of the languages, with an average accuracy improvement of 6%. Further, we show that for 8 out of 12 languages, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors, with an average improvement of 4%. On English text in particular, we show that our approach improves performance over other state of the art techniques.
Improved Fully Unsupervised Parsing with Zoomed Learning
"... We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset Ti its results o ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset Ti its results on its paired test subset Si are better than when it is trained on the entire training set T. A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm’s effect on the leading algorithm for the task of fully unsupervised parsing (Seginer, 2007) in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%. 1
ULISSE: an Unsupervised Algorithm for Detecting Reliable Dependency Parses
"... In this paper we present ULISSE, an unsupervised linguistically–driven algorithm to select reliable parses from the output of a dependency parser. Different experiments were devised to show that the algorithm is robust enough to deal with the output of different parsers and with different languages, ..."
Abstract
- Add to MetaCart
In this paper we present ULISSE, an unsupervised linguistically–driven algorithm to select reliable parses from the output of a dependency parser. Different experiments were devised to show that the algorithm is robust enough to deal with the output of different parsers and with different languages, as well as to be used across different domains. In all cases, ULISSE appears to outperform the baseline algorithms. 1
Roi Reichart 1/4 RESEARCH STATEMENT
"... Natural Language processing (NLP) is a field that combines linguistics, cognitive science, statistical machine learning and other computer science areas in order to compile intelligent computer systems that can understand human languages. NLP has various applications, among which are machine transla ..."
Abstract
- Add to MetaCart
Natural Language processing (NLP) is a field that combines linguistics, cognitive science, statistical machine learning and other computer science areas in order to compile intelligent computer systems that can understand human languages. NLP has various applications, among which are machine translation, question answering and search engines. The field of NLP has, in the past two decades, come to simultaneously rely on and challenge the field of machine learning. Statistical methods now dominate NLP, and have moved the field forward substantially, opening up new possibilities for the exploitation of data in developing NLP components and applications. Many state of the art natural language algorithms are based on supervised learning techniques. In this type of learning, a corpus consisting of texts annotated by human experts is compiled and used to train a learning algorithm. While supervised learning has made substantial contribution to NLP, it faces some significant challenges. Many fundamental NLP tasks, such as syntactic parsing, part-of-speech (POS) tagging and machine translation, involve structured prediction and sequential labeling. For such kind of tasks, compiling annotated corpora is costly and error prone due to the complex nature of annotation. I

