• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features. COLING ’08 (2008)

by Reichart, Ari Rappoport
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Unsupervised Methods for Head Assignments

by Federico Sangati, Willem Zuidema
"... We present several algorithms for assigning heads in phrase structure trees, based on different linguistic intuitions on the role of heads in natural language syntax. Starting point of our approach is the observation that a head-annotated treebank defines a unique lexicalized tree substitution gramm ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
We present several algorithms for assigning heads in phrase structure trees, based on different linguistic intuitions on the role of heads in natural language syntax. Starting point of our approach is the observation that a head-annotated treebank defines a unique lexicalized tree substitution grammar. This allows us to go back and forth between the two representations, and define objective functions for the unsupervised learning of head assignments in terms of features of the implicit lexicalized tree grammars. We evaluate algorithms based on the match with gold standard head-annotations, and the comparative parsing accuracy of the lexicalized grammars they give rise to. On the first task, we approach the accuracy of handdesigned heuristics for English and interannotation-standard agreement for German. On the second task, the implied lexicalized grammars score 4 % points higher on parsing accuracy than lexicalized grammars derived by commonly used heuristics. 1

Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser

by Roi Reichart
"... The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than man ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
The average results obtained by unsupervised statistical parsers have greatly improved in the last few years, but on many specific sentences they are of rather low quality. The output of such parsers is becoming valuable for various applications, and it is radically less expensive to create than manually annotated training data. Hence, automatic selection of high quality parses created by unsupervised parsers is an important problem. In this paper we present PUPA, a POS-based Unsupervised Parse Assessment algorithm. The algorithm assesses the quality of a parse tree using POS sequence statistics collected from a batch of parsed sentences. We evaluate the algorithm by using an unsupervised POS tagger and an unsupervised parser, selecting high quality parsed sentences from English (WSJ) and German (NEGRA) corpora. We show that PUPA outperforms the leading previous parse assessment algorithm for supervised parsers, as well as a strong unsupervised baseline. Consequently, PUPA allows obtaining high quality parses without any human involvement. 1

The NVI Clustering Evaluation Measure

by Roi Reichart
"... Clustering is crucial for many NLP tasks and applications. However, evaluating the results of a clustering algorithm is hard. In this paper we focus on the evaluation setting in which a gold standard solution is available. We discuss two existing information theory based measures, V and VI, and show ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Clustering is crucial for many NLP tasks and applications. However, evaluating the results of a clustering algorithm is hard. In this paper we focus on the evaluation setting in which a gold standard solution is available. We discuss two existing information theory based measures, V and VI, and show that they are both hard to use when comparing the performance of different algorithms and different datasets. The V measure favors solutions having a large number of clusters, while the range of scores given by VI depends on the size of the dataset. We present a new measure, NVI, which normalizes VI to address the latter problem. We demonstrate the superiority of NVI in a large experiment involving an important NLP application, grammar induction, using real corpus data in English, German and Chinese. 1

Improved Unsupervised POS Induction through Prototype Discovery

by Omri Abend, Roi Reichart
"... We present a novel fully unsupervised algorithm for POS induction from plain text, motivated by the cognitive notion of prototypes. The algorithm first identifies landmark clusters of words, serving as the cores of the induced POS categories. The rest of the words are subsequently mapped to these cl ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
We present a novel fully unsupervised algorithm for POS induction from plain text, motivated by the cognitive notion of prototypes. The algorithm first identifies landmark clusters of words, serving as the cores of the induced POS categories. The rest of the words are subsequently mapped to these clusters. We utilize morphological and distributional representations computed in a fully unsupervised manner. We evaluate our algorithm on English and German, achieving the best reported results for this task. 1

Type Level Clustering Evaluation: New Measures and a POS Induction Case Study

by Roi Reichart, Omri Abend
"... Clustering is a central technique in NLP. Consequently, clustering evaluation is of great importance. Many clustering algorithms are evaluated by their success in tagging corpus tokens. In this paper we discuss type level evaluation, which reflects class membership only and is independent of the tok ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Clustering is a central technique in NLP. Consequently, clustering evaluation is of great importance. Many clustering algorithms are evaluated by their success in tagging corpus tokens. In this paper we discuss type level evaluation, which reflects class membership only and is independent of the token statistics of a particular reference corpus. Type level evaluation casts light on the merits of algorithms, and for some applications is a more natural measure of the algorithm’s quality. We propose new type level evaluation measures that, contrary to existing measures, are applicable when items are polysemous, the common case in NLP. We demonstrate the benefits of our measures using a detailed case study, POS induction. We experiment with seven leading algorithms, obtaining useful insights and showing that token and type level measures can weakly or even negatively correlate, which underscores the fact that these two approaches reveal different aspects of clustering quality. 1

Simple Unsupervised Identification of Low-level Constituents

by Elias Ponvert, Jason Baldridge, Katrin Erk
"... Abstract—We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser [1], an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, buildin ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract—We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser [1], an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, building a simple right-branching structure above its clumps actually outperforms the full parser itself. This indicates that much of the CCLParser’s performance comes from good local predictions. Based on this observation, we define a simple bigram model that is competitive with CCLParser for clumping, which further illustrates how important this level of representation is for unsupervised parsing. I.

A U-DOP approach to modeling language acquisition

by Margaux Smets , 2010
"... ..."
Abstract - Add to MetaCart
Abstract not found

Roi Reichart 1/4 RESEARCH STATEMENT

by Roi Reichart, Www. Cs. Huji. Ac. Il/∼roiri
"... Natural Language processing (NLP) is a field that combines linguistics, cognitive science, statistical machine learning and other computer science areas in order to compile intelligent computer systems that can understand human languages. NLP has various applications, among which are machine transla ..."
Abstract - Add to MetaCart
Natural Language processing (NLP) is a field that combines linguistics, cognitive science, statistical machine learning and other computer science areas in order to compile intelligent computer systems that can understand human languages. NLP has various applications, among which are machine translation, question answering and search engines. The field of NLP has, in the past two decades, come to simultaneously rely on and challenge the field of machine learning. Statistical methods now dominate NLP, and have moved the field forward substantially, opening up new possibilities for the exploitation of data in developing NLP components and applications. Many state of the art natural language algorithms are based on supervised learning techniques. In this type of learning, a corpus consisting of texts annotated by human experts is compiled and used to train a learning algorithm. While supervised learning has made substantial contribution to NLP, it faces some significant challenges. Many fundamental NLP tasks, such as syntactic parsing, part-of-speech (POS) tagging and machine translation, involve structured prediction and sequential labeling. For such kind of tasks, compiling annotated corpora is costly and error prone due to the complex nature of annotation. I
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University