Extracting semantic representations from word co-occurrence statistics: A computational study (2007)
| Venue: | Behavior Research Methods |
| Citations: | 25 - 2 self |
BibTeX
@ARTICLE{Bullinaria07extractingsemantic,
author = {John A. Bullinaria and Joseph P. Levy},
title = {Extracting semantic representations from word co-occurrence statistics: A computational study},
journal = {Behavior Research Methods},
year = {2007},
pages = {510--526}
}
Years of Citing Articles
OpenURL
Abstract
Abstract: In a previous paper we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of Pointwise Mutual Information (PMI) values from very small co-occurrence windows, together with a cosine distance measure, consistently resulted in the best representations across a range of psychologically relevant semantic tasks. This paper extends that study by investigating the use of three further factors, namely the application of stop-lists, word stemming, and dimensionality reduction using Singular Value Decomposition (SVD), that have been used to provide improved performance elsewhere. It also introduces an additional semantic task and explores the advantages of using a much larger corpus. This leads to the discovery and analysis of improved SVD based methods for generating semantic representations (that provide new state-of-the-art performance on a standard TOEFL task) and the identification and discussion of problems and misleading results that can arise without a full systematic study.







