Results 1 - 10
of
236
Dependency-based construction of semantic space models
- Computational Linguistics
, 2007
"... Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of m ..."
Abstract
-
Cited by 236 (14 self)
- Add to MetaCart
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of models, which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection, and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art. 1.
Vector-based models of semantic composition
- In Proceedings of ACL-08: HLT
, 2008
"... This paper proposes a framework for representing the meaning of phrases and sentences in vector space. Central to our approach is vector composition which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models which ..."
Abstract
-
Cited by 220 (5 self)
- Add to MetaCart
(Show Context)
This paper proposes a framework for representing the meaning of phrases and sentences in vector space. Central to our approach is vector composition which we operationalize in terms of additive and multiplicative functions. Under this framework, we introduce a wide range of composition models which we evaluate empirically on a sentence similarity task. Experimental results demonstrate that the multiplicative models are superior to the additive alternatives when compared against human judgments.
Composition in distributional models of semantics
, 2010
"... Distributional models of semantics have proven themselves invaluable both in cog-nitive modelling of semantic phenomena and also in practical applications. For ex-ample, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffit ..."
Abstract
-
Cited by 148 (3 self)
- Add to MetaCart
(Show Context)
Distributional models of semantics have proven themselves invaluable both in cog-nitive modelling of semantic phenomena and also in practical applications. For ex-ample, they have been used to model judgments of semantic similarity (McDonald, 2000) and association (Denhire and Lemaire, 2004; Griffiths et al., 2007) and have been shown to achieve human level performance on synonymy tests (Landuaer and Dumais, 1997; Griffiths et al., 2007) such as those included in the Test of English as Foreign Language (TOEFL). This ability has been put to practical use in automatic the-saurus extraction (Grefenstette, 1994). However, while there has been a considerable amount of research directed at the most effective ways of constructing representations for individual words, the representation of larger constructions, e.g., phrases and sen-tences, has received relatively little attention. In this thesis we examine this issue of how to compose meanings within distributional models of semantics to form represen-tations of multi-word structures. Natural language data typically consists of such complex structures, rather than
Sentence Similarity Based on Semantic Nets and Corpus Statistics
- James O’Shea, Zuhair Bandar and Keeley Crockett
"... 1 Abstract: Sentence similarity measures play an increasingly important role in textrelated research and applications in areas such as text mining, web page retrieval and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documen ..."
Abstract
-
Cited by 89 (3 self)
- Add to MetaCart
(Show Context)
1 Abstract: Sentence similarity measures play an increasingly important role in textrelated research and applications in areas such as text mining, web page retrieval and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high dimensional space and are consequently inefficient, require human input and are not adaptable to some application domains. This paper focuses directly on computing the similarity between very short texts of sentence length. It presents an algorithm that takes account of semantic information and word order information implied in the sentences. The semantic similarity of two sentences is calculated using information from a structured lexical database and from corpus statistics. The use of a lexical database enables our method to model human common sense knowledge and the incorporation of corpus statistics allows our method to be adaptable to different domains. The proposed method can be used in a variety of applications that involve text knowledge representation and discovery. Experiments on two sets of selected sentence pairs demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition.
Mathematical foundations for a compositional distributional model of meaning
- LINGUISTIC ANALYSIS (LAMBEK FESTSCHRIFT
"... We propose a mathematical framework for a unification of the distributional theory of meaning in terms of vector space models, and a compositional theory for grammatical types, for which we rely on the algebra of Pregroups, introduced by Lambek. This mathematical framework enables us to compute the ..."
Abstract
-
Cited by 86 (18 self)
- Add to MetaCart
We propose a mathematical framework for a unification of the distributional theory of meaning in terms of vector space models, and a compositional theory for grammatical types, for which we rely on the algebra of Pregroups, introduced by Lambek. This mathematical framework enables us to compute the meaning of a well-typed sentence from the meanings of its constituents. Concretely, the type reductions of Pregroups are ‘lifted’ to morphisms in a category, a procedure that transforms meanings of constituents into a meaning of the (well-typed) whole. Importantly, meanings of whole sentences live in a single space, independent of the grammatical structure of the sentence. Hence the inner-product can be used to compare meanings of arbitrary sentences, as it is for comparing the meanings of words in the distributional model. The mathematical structure we employ admits a purely diagrammatic calculus which exposes how the information flows between the words in a sentence in order to make up the meaning of the whole sentence. A variation of our ‘categorical model ’ which involves constraining the scalars of the vector spaces to the semiring of Booleans results in a Montague-style Boolean-valued semantics.
Tricks of memory
- Current Directions in Psychological Science
, 2000
"... Publication details, including instructions for authors and subscription information: ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
(Show Context)
Publication details, including instructions for authors and subscription information:
Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported . . .
- INTERNATIONAL JOURNAL OF COMPUTER-SUPPORTED COLLABORATIVE LEARNING
, 2008
"... In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a new publicly available tool set called TagHelper tools. Analyzing the variety of different ..."
Abstract
-
Cited by 63 (17 self)
- Add to MetaCart
In this article we describe the emerging area of text classification research focused on the problem of collaborative learning process analysis both from a broad perspective and more specifically in terms of a new publicly available tool set called TagHelper tools. Analyzing the variety of different facets of learners’ interaction that are important for their learning is a time consuming and effortful process. Improving automated analyses of such highly valued processes of collaborative learning by adapting and applying recent text classification technologies would make it a less arduous task to obtain insights from corpus data. It also holds the potential for enabling substantially improved on-line instruction both by providing teachers and facilitators with reports about the groups they are moderating and by scaffolding technology as in the emerging area of context sensitive collaborative learning support triggered dynamically on an as-needed basis. In this article, we report on an interdisciplinary research project, which has been investigating the effectiveness of applying text classification technology to a large CSCL discourse corpus that had been analyzed by human coders using a theory-based multi-dimensional coding scheme. We report promising results and include an in-depth discussion of important issues such as reliability, validity, and efficiency that should be considered when deciding on the appropriateness of adopting a new technology such as TagHelper tools.
Metaphor comprehension: A computational theory
, 2000
"... Metaphor comprehension involves an interaction between the meaning of the topic and vehicle terms of the metaphor. Meaning is represented by vectors in a high-dimensional semantic space. Predication modifies the topic vector by merging it with selected features of the vehicle vector. The resulting m ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
Metaphor comprehension involves an interaction between the meaning of the topic and vehicle terms of the metaphor. Meaning is represented by vectors in a high-dimensional semantic space. Predication modifies the topic vector by merging it with selected features of the vehicle vector. The resulting metaphor vector can be evaluated by comparing it with known landmarks in the semantic space. Thus, metaphorical predication is treated in the present model in exactly the same way as literal predication. Some experimental results concerning metaphor comprehension are simulated within this framework, such as the non-reversibility of metaphors, priming of metaphors with literal statements, and priming of literal statements with metaphors.
Semantic text similarity using corpus-based word similarity and string similarity
- ACM Transactions on Knowledge Discovery from Data (TKDD
, 2008
"... We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on e ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.
Predication
- COGNITIVE SCIENCE
, 2001
"... In Latent Semantic Analysis (LSA) the meaning of a word is represented as a vector in a high-dimensional semantic space. Different meanings of a word or different senses of a word are not distinguished. Instead, word senses are appropriately modified as the word is used in different contexts. In N-V ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
In Latent Semantic Analysis (LSA) the meaning of a word is represented as a vector in a high-dimensional semantic space. Different meanings of a word or different senses of a word are not distinguished. Instead, word senses are appropriately modified as the word is used in different contexts. In N-VP sentences, the precise meaning of the verb phrase depends on the noun it is combined with. An algorithm is described to adjust the meaning of a predicate as it is applied to different arguments. In forming a sentence meaning, not all features of a predicate are combined with the features of the argument, but only those that are appropriate to the argument. Hence, a different "sense" of a predicate emerges every time it is used in a different context. This predication algorithm is explored in the context of four different semantic problems: metaphor interpretation, causal inferences, similarity judgments, and homonym disambiguation.