Results 11 - 20
of
175
Evolvable Biologically Plausible Visual Architectures
- Proceedings of British Machine Vision Conference
, 2001
"... Much work in AI is fragmented, partly because the subject is so huge that it is difficult for anyone to think about all of it. Even within sub-fields, such as language, reasoning, and vision, there is fragmentation, as the subsub -fields are rich enough to keep people busy all their lives. However, ..."
Abstract
-
Cited by 25 (15 self)
- Add to MetaCart
Much work in AI is fragmented, partly because the subject is so huge that it is difficult for anyone to think about all of it. Even within sub-fields, such as language, reasoning, and vision, there is fragmentation, as the subsub -fields are rich enough to keep people busy all their lives. However, there is a risk that results of isolated research will be unsuitable for future integration, e.g. in models of complete organisms, or human like robots. This paper offers a framework for thinking about the many components of visual systems and how they relate to the whole organism or machine. The viewpoint is biologically inspired, using conjectured evolutionary history as a guide to some of the features of the architecture. It may also be useful both for modelling animal vision and designing robots with similar capabilities. An online slide presentation based on this paper is available as talk 8 here: http://www.cs.bham.ac.uk/~axs/misc/talks/ Talk 7, on visual reasoning, is also relevant. 1
The cognitive and neural architecture of sequence representation
- Psychological Review
, 1998
"... The authors theorize that 2 neurocognitive sequence-learning systems can be distinguished in serial reaction time experiments, one dorsal (parietal and supplementary motor cortex) and the other ventral (temporal and lateral prefrontal cortex). Dorsal system learning is implicit and associates noncat ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
The authors theorize that 2 neurocognitive sequence-learning systems can be distinguished in serial reaction time experiments, one dorsal (parietal and supplementary motor cortex) and the other ventral (temporal and lateral prefrontal cortex). Dorsal system learning is implicit and associates noncategorized stimuli within dimensional modules. Ventral system learning can be implicit or explicit. It also allows associating events across dimensions and therefore is the basis of cross-task integration or interference, depending on degree of cross-task correlation of signals. Accordingly, lack of correlation rather than limited capacity is responsible for dual-task effects on learning. The theory is relevant to issues of attentional effects on learning; the representational basis of complex, sequential skills; hippocampalversus basal ganglia-based learning; procedural versus declarative memory; and implicit versus explicit memory. The ability to produce and learn sequential actions is one of the hallmarks of human cognition. Indeed, this ability has been hypothesized to constitute a fundamental adaptation that characterizes
Visual Speech Synthesis Based On Parameter Generation From HMM: Speech-Driven And Text-And-Speech-Driven Approaches
- In ICASSP
, 1998
"... This paper describes a technique for synthesizing synchronized lip movements from auditory input speech signal. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. Audio-visual speech unit HM ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
This paper describes a technique for synthesizing synchronized lip movements from auditory input speech signal. The technique is based on an algorithm for parameter generation from HMM with dynamic features, which has been successfully applied to text-to-speech synthesis. Audio-visual speech unit HMMs, namely, syllable HMMs are trained with parameter vector sequences that represent both auditory and visual speech features. Input speech is recognized using the syllable HMMs and converted into a transcription and a state sequence. A sentence HMM is constructed by concatenating the syllable HMMs corresponding to the transcription for the input speech. Then an optimum visual speech parameter sequence is generated from the sentence HMM in ML sense. Since the generated parameter sequence reflects statistical information of both static and dynamic features of several phonemes before and after the current phonemes, synthetic lip motion becomes smooth and realistic. We show experimental results...
Multimedia content processing through cross-modal association
- In MULTIMEDIA ’03: Proceedings of the eleventh ACM international conference on Multimedia
, 2003
"... Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. We investigate different crossmodal association methods using the linear correlation model. We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA). Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that use offline supervised training. As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail. Different association methods are tested and compared using the proposed cross-modal retrieval system. All these methods achieve significant dimensionality reduction. Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads. The CFA method achieves 91.1 % detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 66.1 % and 73.9 % accuracy, respectively. As shown by experiments, crossmodal association provides many useful benefits, such as robust noise resistance and effective feature selection. Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage. Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications.
Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
- EURASIP J. APPL. SIGNAL PROCESSING
, 2002
"... When trying to overcome the significant performance drops of ASR systems in the presence of noise, one road to follow is the integration of the information present in the lips movement of the speaker. Comparisons showed that integration of audio and video data on the decision level yields best re ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
When trying to overcome the significant performance drops of ASR systems in the presence of noise, one road to follow is the integration of the information present in the lips movement of the speaker. Comparisons showed that integration of audio and video data on the decision level yields best recognition results. This raises the question how to weight the two modalities in different noise conditions. Throughout this article we develop a weighting process adaptive to various background noise situations. Firstly
Large-Vocabulary Audio-Visual Speech Recognition by Machines and Humans
- of the Johns Hopkins Summer 2000 Workshop,” in Proc. Works. Signal Processing
, 2001
"... We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech recognition (LVCSR) domain. Specifically, we study the benefit of the visual modality for both machines and humans, when combined with audio degraded by speech-babble noise at va ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech recognition (LVCSR) domain. Specifically, we study the benefit of the visual modality for both machines and humans, when combined with audio degraded by speech-babble noise at various signal-to-noise ratios (SNRs). We first consider an automatic speechreading system with a pixel based visual front end that uses feature fusion for bimodal integration, and we compare its performance with an audio-only LVCSR system. We then describe results of human speech perception experiments, where subjects are asked to transcribe audio-only and audiovisual utterances at various SNRs. For both machines and humans, we observe approximately a 6 dB effective SNR gain compared to the audio-only performance at 10 dB, however such gains significantly diverge at other SNRs. Furthermore, automatic audio-visual recognition outperforms human audioonly speech perception at low SNRs. 1.
Of packets and people: A User-centered Approach to Quality of Service
, 2001
"... Multimedia communication has gained increasing attention, both from the application side and the network provider side. While resource provisioning for QoS support in packet switched networks has lead to the design and development of sophisticated QoS architectures, notably ATM, IntServ or DiffServ, ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Multimedia communication has gained increasing attention, both from the application side and the network provider side. While resource provisioning for QoS support in packet switched networks has lead to the design and development of sophisticated QoS architectures, notably ATM, IntServ or DiffServ, research has not exactly been user or application-context centered. In the cause of the evolution of QoS architectures, the integrated service network approach has lost momentum, and with it, the notion of QoS guarantees. Differentiation of QoS classes within the DiffServ framework is based on the definition of various per-hop behaviors. What is currently missing is a technique for specification and mapping of application and user QoS preferences onto evolving service profiles. In addition, adaptation of applications (and users) is becoming increasingly important in the face of dominating weak QoS-assurance paradigms, both in wireline and wireless environments. As a prerequisite, this paper...
Long-Term Working Memory and Interrupting Messages in Human-Computer Interaction
, 2004
"... The extent to which memory for information content is reliable, trustworthy, and accurate is crucial in the information age. Being forced to divert attention to interrupting messages is common, however, and can cause memory loss. The memory e#ects of interrupting messages were investigated in three ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
The extent to which memory for information content is reliable, trustworthy, and accurate is crucial in the information age. Being forced to divert attention to interrupting messages is common, however, and can cause memory loss. The memory e#ects of interrupting messages were investigated in three experiments. In Experiment 1, attending to an interrupting message decreased memory accuracy. Experiment 2, where four interrupting messages were used, replicated this result. In Experiment 3, an interrupting message was shown to be most disturbing when it was semantically very close to the main message. Drawing from a theory of long-term working memory it is argued that interrupting messages can both disrupt the active semantic elaboration of content during encoding and cause semantic interference upon retrieval. Properties of the interrupting message a#ect the extent and type of errors in remembering. Design implications are discussed.
Utility Curves: Mean Opinion Scores Considered Biased
- Proceedings of 7 th International Workshop on Quality of Service, 1 st - 4 th
, 1999
"... this paper, we provide evidence for such a conclusion and outline an approach, based This research was supported in part by Deutsche Forschungsgemeinschaft (DFG) under award ME 1703/1-1 ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
this paper, we provide evidence for such a conclusion and outline an approach, based This research was supported in part by Deutsche Forschungsgemeinschaft (DFG) under award ME 1703/1-1
Unsupervised Classification Learning from Cross-Modal Environmental Structure
, 1994
"... This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised di ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised discriminant algorithm. The advantage of supervised learning is that the final error metric is available during training. Unfortunately, when modeling human category learning, or in constructing classifiers for autonomous robots, one must deal with not having an omniscient entity labeling all incoming sensory patterns. We show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities. For example the co-occurrence of a visual image of a cow with a "moo" sound can be used to simultaneously develop appropriate visual features for distinguishing the cow image and appropriate auditory features for recognizing the moo. We mode...

