Results 1 - 10
of
26
Topics in semantic representation
- Psychological Review
, 2007
"... Processing language requires the retrieval of concepts from memory in response to an ongoing stream of information. This retrieval is facilitated if one can infer the gist of a sentence, conversation, or document computational problem underlying the extraction and use of gist, formulating this probl ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
Processing language requires the retrieval of concepts from memory in response to an ongoing stream of information. This retrieval is facilitated if one can infer the gist of a sentence, conversation, or document computational problem underlying the extraction and use of gist, formulating this problem as a rational statistical inference. This leads to a novel approach to semantic representation in which word meanings are represented in terms of a set of probabilistic topics. The topic model performs well in predicting word association and the effects of semantic association and ambiguity on a variety of language-processing and memory tasks. It also provides a foundation for developing more richly structured statistical models of language, as the generative process assumed in the topic model can easily be extended to incorporate other kinds of semantic and syntactic structure.
A structured vector space model for word meaning in context
, 2008
"... We address the task of computing vector space representations for the meaning of word occurrences, which can vary widely according to context. This task is a crucial step towards a robust, vector-based compositional account of sentence meaning. We argue that existing models for this task do not take ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
We address the task of computing vector space representations for the meaning of word occurrences, which can vary widely according to context. This task is a crucial step towards a robust, vector-based compositional account of sentence meaning. We argue that existing models for this task do not take syntactic structure sufficiently into account. We present a novel structured vector space model that addresses these issues by incorporating the selectional preferences for words’ argument positions. This makes it possible to integrate syntax into the computation of word meaning in context. In addition, the model performs at and above the state of the art for modeling the contextual adequacy of paraphrases. 1
Extracting semantic representations from word co-occurrence statistics: A computational study
- Behavior Research Methods
, 2007
"... Abstract: In a previous paper we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of Pointwise Mutual Information (PMI) values from very small co-occurr ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Abstract: In a previous paper we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of Pointwise Mutual Information (PMI) values from very small co-occurrence windows, together with a cosine distance measure, consistently resulted in the best representations across a range of psychologically relevant semantic tasks. This paper extends that study by investigating the use of three further factors, namely the application of stop-lists, word stemming, and dimensionality reduction using Singular Value Decomposition (SVD), that have been used to provide improved performance elsewhere. It also introduces an additional semantic task and explores the advantages of using a much larger corpus. This leads to the discovery and analysis of improved SVD based methods for generating semantic representations (that provide new state-of-the-art performance on a standard TOEFL task) and the identification and discussion of problems and misleading results that can arise without a full systematic study.
Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space
"... We propose an approach to adjective-noun composition (AN) for corpus-based distributional semantics that, building on insights from theoretical linguistics, represents nouns as vectors and adjectives as data-induced (linear) functions (encoded as matrices) over nominal vectors. Our model significant ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We propose an approach to adjective-noun composition (AN) for corpus-based distributional semantics that, building on insights from theoretical linguistics, represents nouns as vectors and adjectives as data-induced (linear) functions (encoded as matrices) over nominal vectors. Our model significantly outperforms the rivals on the task of reconstructing AN vectors not seen in training. A small post-hoc analysis further suggests that, when the model-generated AN vector is not similar to the corpus-observed AN vector, this is due to anomalies in the latter. We show moreover that our approach provides two novel ways to represent adjective meanings, alternative to its representation via corpus-based co-occurrence vectors, both outperforming the latter in an adjective clustering task. 1
A graph-theoretic model of lexical syntactic acquisition
- In M. Lapata & H. Tou Ng (Eds.), Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 917–926
, 2008
"... This paper presents a graph-theoretic model of the acquisition of lexical syntactic representations. The representations the model learns are non-categorical or graded. We propose a new evaluation methodology of syntactic acquisition in the framework of exemplar theory. When applied to the CHILDES c ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a graph-theoretic model of the acquisition of lexical syntactic representations. The representations the model learns are non-categorical or graded. We propose a new evaluation methodology of syntactic acquisition in the framework of exemplar theory. When applied to the CHILDES corpus, the evaluation shows that the model’s graded syntactic representations perform better than previously proposed categorical representations. 1
Modeling a three term fan effect
- Drexel University
, 2010
"... A fan effect experiment where participants perform recall and recognition tasks on a study set of sentences with three content words was conducted. The aggregate results confirm a fan effect (Anderson, 1974). A model of the recall and recognition tasks was created using Dynamically Structured Hologr ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A fan effect experiment where participants perform recall and recognition tasks on a study set of sentences with three content words was conducted. The aggregate results confirm a fan effect (Anderson, 1974). A model of the recall and recognition tasks was created using Dynamically Structured Holographic memory (DSHM). A comparison to the human data is presented. A discussion of the current resonance based mechanisms in DSHM for generating recognition accuracy and reaction time data is presented. This is contrasted with a previously employed retrieval based mechanism.
Buzz Monitoring in Word Space
"... Abstract. This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source — a task sometimes referred to as buzz monitoring. St ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source — a task sometimes referred to as buzz monitoring. Standard approaches from the field of information access for identifying salient textual entities are reviewed, and it is argued that the dynamics of buzz monitoring calls for more accomplished analysis mechanisms than the typical text analysis tools provide today. The notion of word space is introduced, and it is argued that word spaces can be used to select the most salient markers for topicality, find associations those observations engender, and that they constitute an attractive foundation for building a representation well suited for the tracking and monitoring of mentions of the entity under consideration. 1 Buzz monitoring as a text analysis task Buzz monitoring is the task of tracking text sources, with special attention given
A Holographic Associative Memory Recommender System
"... We describe a recommender system based on Dynamically Structured Holographic Memory (DSHM), a cognitive model of associative memory that uses holographic reduced representations as the basis for its encoding of object associations. We compare this recommender to a conventional user-based collaborati ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We describe a recommender system based on Dynamically Structured Holographic Memory (DSHM), a cognitive model of associative memory that uses holographic reduced representations as the basis for its encoding of object associations. We compare this recommender to a conventional user-based collaborative filtering algorithm on three datasets: MovieLens, and two bibliographic datasets such as those typically found in a digital library. Off-line experiments show that the holographic recommender is competitive in accuracy for predicting movie preferences and more accurate than collaborative filtering on very sparse data sets. However, DSHM requires significant amounts of computational resources which may may require a distributed implementation for it to be practical as a recommender for large data sets. 1
Logical Leaps and Quantum Connectives: Forging Paths through Predication Space
"... The Predication-based Semantic Indexing (PSI) approach encodes both symbolic and distributional information into a semantic space using a permutation-based variant of Random Indexing. In this paper, we develop and evaluate a computational model of abductive reasoning based on PSI. Using distribution ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The Predication-based Semantic Indexing (PSI) approach encodes both symbolic and distributional information into a semantic space using a permutation-based variant of Random Indexing. In this paper, we develop and evaluate a computational model of abductive reasoning based on PSI. Using distributional information, we identify pairs of concepts that are likely to be predicated about a common third concept, or middle term. As this occurs without the explicit identification of the middle term concerned, we refer to this process as a “logical leap”. Subsequently, we use further operations in the PSI space to retrieve this middle term and identify the predicate types involved. On evaluation using a set of 1000 randomly selected cue concepts, the model is shown to retrieve with accuracy concepts that can be connected to a cue concept by a middle term, as well as the middle term concerned, using nearestneighbor search in the PSI space. The utility of quantum logical operators as a means to identify alternative paths through this space is also explored.
Combining Background Knowledge and Learned Topics
"... Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. Although topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always id ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. Although topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human-defined concepts, however, tend to be semantically richer due to careful selection of words that define the concepts, but they may not span the themes in a data set exhaustively. In this study, we review a new probabilistic framework for combining a hierarchy of human-defined semantic concepts with a statistical topic model to seek the best of both worlds. Results indicate that this combination leads to systematic improvements in generalization performance as well as enabling new techniques for inferring and visualizing the content of a document.

