Results 1 -
3 of
3
Retrieving Collocations from Text: Xtract
- Computational Linguistics
, 1993
"... Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of wri ..."
Abstract
-
Cited by 229 (1 self)
- Add to MetaCart
Natural languages are full of collocations, recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages. Recent work in lexicography indicates that collocations are pervasive in English; apparently, they are common in all types of writing, including both technical and nontechnical genres. Several approaches have been proposed to retrieve various types of collocations from the analysis of large samples of textual data. These techniques automatically produce large numbers of collocations along with statistical figures intended to reflect the relevance of the associations. However, noue of these techniques provides functional information along with the collocation. Also, the results produced often contained improper word associations reflecting some spurious aspect of the training corpus that did not stand for true collocations. In this paper, we describe a set of techniques based on statistical methods for retrieving and identifying collocations from large textual corpora. These techniques produce a wide range of collocations and are based on some original filtering methods that allow the production of richer and higher-precision output. These techniques have been implemented and resulted in a lexicographic tool, Xtract. The techniques are described and some results are presented on a 10 million-word corpus of stock market news reports. A lexicographic evaluation of Xtract as a collocation retrieval tool has been made, and the estimated precision of Xtract is 80%.
Thai Co-Occurrence Dictionary: Technical Report
, 1995
"... This paper presents the co-occurrence dictionary based on Thai phenomena. The theoretical background, the data structure, the dictionary development and word collocation information are described in details. At present, 75,000 word collocations have been added in the co-occurrence dictionary with th ..."
Abstract
- Add to MetaCart
This paper presents the co-occurrence dictionary based on Thai phenomena. The theoretical background, the data structure, the dictionary development and word collocation information are described in details. At present, 75,000 word collocations have been added in the co-occurrence dictionary with the help of linguists who made much effort in encoding the linguistic information. Hopefully, the word collocation information presented in this paper will be the useful resources for the natural language processing studies, and second language acquisition.

