• Documents
  • Authors
  • Tables

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Retrieving Collocations from Text: Xtract (1993)

Cached

  • Download as a PDF

Download Links

  • [luthuli.cs.uiuc.edu]
  • [luthuli.cs.uiuc.edu]
  • [acl.ldc.upenn.edu]
  • [wing.comp.nus.edu.sg]
  • [aclweb.org]
  • [www.aclweb.org]
  • [aclweb.org]
  • [ucrel.lancs.ac.uk]
  • [wing.comp.nus.edu.sg]
  • [www.cs.columbia.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Frank Smadja
Venue:Computational Linguistics
Citations:353 - 1 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Smadja93retrievingcollocations,
    author = {Frank Smadja},
    title = {Retrieving Collocations from Text: Xtract},
    journal = {Computational Linguistics},
    year = {1993},
    volume = {19},
    pages = {143--177}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to retrieve various types of collocations from the analysis of large samples of textual data. These techniques automatically produce large numbers of collocations along with statistical figures intended to reflect the relevance of the associations. However, noue of these techniques provides functional information along with the collocation. Also, the results produced often contained improper word associations reflecting some spurious aspect of the training corpus that did not stand for true collocations. In this paper, we describe a set of techniques based on statistical methods for retrieving and identifying collocations from large textual corpora. These techniques produce a wide range of collocations and are based on some original filtering methods that allow the production of richer and higher-precision output. These techniques have been implemented and resulted in a lexicographic tool, Xtract. The techniques are described and some results are presented on a 10 million-word corpus of stock market news reports. A lexicographic evaluation of Xtract as a collocation retrieval tool has been made, and the estimated precision of Xtract is 80%.

Keyphrases

improper word association    higher-precision output    true collocation    large number    lexicographic tool    original filtering method    large sample    recent work    nontechnical genre    functional information    several approach    collocation retrieval tool    various type    natural language    statistical figure    textual data    statistical method    million-word corpus    arbitrary word usage    lexicographic evaluation    recurrent combination    stock market news report    wide range    large textual corpus    training corpus    spurious aspect   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University