Results 1 - 10
of
10
Representing word meaning and order information in a composite holographic lexicon
- Psychological Review
, 2007
"... The authors present a computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language. The model uses simple convolution and superposition mechanisms (cf. B. B. Murdock, 1982) to learn distributed holographic repr ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
The authors present a computational model that builds a holographic lexicon representing both word meaning and word order from unsupervised experience with natural language. The model uses simple convolution and superposition mechanisms (cf. B. B. Murdock, 1982) to learn distributed holographic representations for words. The structure of the resulting lexicon can account for empirical data from classic experiments studying semantic typicality, categorization, priming, and semantic constraint in sentence completions. Furthermore, order information can be retrieved from the holographic representations, allowing the model to account for limited word transitions without the need for built-in transition rules. The model demonstrates that a broad range of psychological data can be accounted for directly from the structure of lexical representations learned in this way, without the need for complexity to be built into either the processing mechanisms or the representations. The holographic representations are an appropriate knowledge representation to be used by higher order models of language comprehension, relieving the complexity required at the higher level.
Intentional context in situated natural language learning
- In Proc. of 9th Conf. on Computational Natural Language Learning (CoNLL-2005
, 2005
"... Natural language interfaces designed for situationally embedded domains (e.g. cars, videogames) must incorporate knowledge about the users ’ context to address the many ambiguities of situated language use. We introduce a model of situated language acquisition that operates in two phases. First, int ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Natural language interfaces designed for situationally embedded domains (e.g. cars, videogames) must incorporate knowledge about the users ’ context to address the many ambiguities of situated language use. We introduce a model of situated language acquisition that operates in two phases. First, intentional context is represented and inferred from user actions using probabilistic context free grammars. Then, utterances are mapped onto this representation in a noisy channel framework. The acquisition model is trained on unconstrained speech collected
Unsupervised content-based indexing for sports video retrieval
- In Ninth ACM Workshop on Multimedia Information Retrieval (MIR 2007
, 2007
"... This demonstration presents an interface to a corpus of broadcast baseball games that have been indexed using an unsupervised content-based method introduced here. The method uses the concept of a grounded language model to motivate a framework in which video is searched using natural language with ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This demonstration presents an interface to a corpus of broadcast baseball games that have been indexed using an unsupervised content-based method introduced here. The method uses the concept of a grounded language model to motivate a framework in which video is searched using natural language with no reliance on predetermined concepts or hand labeled events. The interface demonstrates the effectiveness of the technique and the ease of use it affords the user.
Acquiring linguistic argument structure from multimodal input using attentive focus
- In Development and Learning, 2008. ICDL 2008. 7th IEEE International Conference on (2008
"... Abstract—This work is premised on three assumptions: that the semantics of certain actions may be learned prior to language, that objects in attentive focus are likely to indicate the arguments participating in that action, and that knowing such arguments helps align linguistic attention on the rele ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract—This work is premised on three assumptions: that the semantics of certain actions may be learned prior to language, that objects in attentive focus are likely to indicate the arguments participating in that action, and that knowing such arguments helps align linguistic attention on the relevant predicate (verb). Using a computational model of dynamic attention, we present an algorithm that clusters visual events into action classes in an unsupervised manner using the Merge Neural Gas algorithm. With few clusters, the model correlates to coarse concepts such as come-closer, but with a finer granularity, it reveals hierarchical substructure such as come-closer-one-object-static and come-closer-both-moving. That the argument ordering is non-commutative is discovered for actions such as chase or come-closer-one-object-static. Knowing the arguments, and given that noun-referent mappings that are easily learned, language learning can now be constrained by considering only linguistic expressions and actions that refer to the objects in perceptual focus. We learn action schemas for linguistic units like “moving towards ” or “chase”, and validate our results by producing output commentaries for 3D video. I.
Taking advantage of the situation: Non-linguistic context for natural language interfaces to interactive virtual environments
- In Proceedings of International Conference on Intelligent User Interfaces (IUI
, 2006
"... We introduce a framework for learning situated Natural Language Interfaces (NLIs) to interactive virtual environments. The framework exploits the non-linguistic context, or situation, explicitly modeled in such interactive applications. This situation model is integrated with a model of word meaning ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We introduce a framework for learning situated Natural Language Interfaces (NLIs) to interactive virtual environments. The framework exploits the non-linguistic context, or situation, explicitly modeled in such interactive applications. This situation model is integrated with a model of word meaning in a principled manner using a noisy channel approach to language understanding. Preliminary experimentation in an independently designed interactive application, i.e. the Mission Rehearsal Exercise (MRE), shows that this situated NLI outperforms a state of the art NLI on both whole frame accuracy and F-Score metrics. Further, use of the situation model in the situated NLI is shown to increase robustness to the noise introduced by the use of automatic speech recognition. Categories and Subject Descriptors
Grounded Language Modeling for Automatic Speech Recognition of Sports Video
"... Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that gro ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that grounded language models improve perplexity and word error rate over text based language models, and further, support video information retrieval better than human generated speech transcriptions. 1
Mining Temporal Patterns of Movement for Video Event Recognition Michael Fleischman Cognitive Machines Group
"... Scalable approaches to video event recognition are limited by an inability to automatically generate representations of events that encode abstract temporal structure. This paper presents a method in which temporal information is captured by representing events using a lexicon of hierarchical patter ..."
Abstract
- Add to MetaCart
Scalable approaches to video event recognition are limited by an inability to automatically generate representations of events that encode abstract temporal structure. This paper presents a method in which temporal information is captured by representing events using a lexicon of hierarchical patterns of movement that are mined from large corpora of unannotated video data. These patterns are then used as features for a discriminative model of event recognition that exploits tree kernels in a Support Vector Machine. Evaluations show the method learns informative patterns on a 1450-hour video corpus of natural human activities recorded in the home.
Video Content Classification
"... Scalable approaches to video content classification are limited by an inability to automatically generate representations of events that encode abstract temporal structure. This paper presents a method in which temporal information is captured by representing events using a lexicon of hierarchical p ..."
Abstract
- Add to MetaCart
Scalable approaches to video content classification are limited by an inability to automatically generate representations of events that encode abstract temporal structure. This paper presents a method in which temporal information is captured by representing events using a lexicon of hierarchical patterns of movement that are mined from large corpora of unannotated video data. These patterns are then used as features for a discriminative model of event classification that exploits tree kernels in a Support Vector Machine. Evaluations show the method learns informative patterns on a 1450-hour video corpus of natural human activities recorded in the home.
Extracting aspects of determiner meaning from dialogue in a virtual world environment
"... We use data from a virtual world game for automated learning of words and grammatical constructions and their meanings. The language data are an integral part of the social interaction in the game and consist of chat dialogue, which is only constrained by the cultural context, as set by the nature o ..."
Abstract
- Add to MetaCart
We use data from a virtual world game for automated learning of words and grammatical constructions and their meanings. The language data are an integral part of the social interaction in the game and consist of chat dialogue, which is only constrained by the cultural context, as set by the nature of the provided virtual environment. Building on previous work, where we extracted a vocabulary for concrete objects in the game by making use of the non-linguistic context, we now target NP/DP grammar, in particular determiners. We assume that we have captured the meanings of a set of determiners if we can predict which determiner will be used in a particular context. To this end we train a classifier that predicts the choice of a determiner on the basis of features from the linguistic and non-linguistic context. 1
Grounded Language Acquisition: A Minimal Commitment Approach
"... We take up the challenge of learning a grounded model of language when our agent has a body of machine learning algorithms and no prior knowledge of either the physical domain or language, in the sense of "least commitment". Based on a 2D video and co-occurring raw text, we demonstrate how this cogn ..."
Abstract
- Add to MetaCart
We take up the challenge of learning a grounded model of language when our agent has a body of machine learning algorithms and no prior knowledge of either the physical domain or language, in the sense of "least commitment". Based on a 2D video and co-occurring raw text, we demonstrate how this cognitively inspired model segments the world to obtain a meaning space, and combines words into hierarchical patterns for a linguistic pattern space. By associating these two spaces under temporal co-occurrence constraints, we demonstrate the acquisition of term-meaning pairs for names, actions and relations. We next map physical arguments for actions and relations to syntactical constructions resembling a cognitive grammar framework. Thus the system is able to bootstrap a rudimentary lexicon and syntax. While experiments are primarily in English, we present partial results for Hindi obtained without any change in the methods, to indicate its potential application to other languages.

