Results 1 - 10
of
17
Multimodal fusion for multimedia analysis: a survey
, 2010
"... This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several c ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.
A semantic vector space for query by image example
- in ACM SIGIR Conf. on research and development in information retrieval, Multimedia Information Retrieval Workshop
, 2007
"... Content-based image retrieval enables the user to search a database for visually similar images. In these scenarios, the user submits an example that is compared to the images in the database by their low-level characteristics such as colour, texture and shape. While visual similarity is essential f ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Content-based image retrieval enables the user to search a database for visually similar images. In these scenarios, the user submits an example that is compared to the images in the database by their low-level characteristics such as colour, texture and shape. While visual similarity is essential for a vast number of applications, there are cases where a user needs to search for semantically similar images. For example, the user might want to find all images depicting bears on a river. This might be quite difficult using only low-level features, but using concept detectors for “bear ” and “river ” will produce results that are semantically closer to what the user requested. Following this idea, this paper studies a novel paradigm: query by semantic multimedia example. In this setting the user’s query is processed at a semantic level: a vector of concept probabilities is inferred for each image and a similarity metric computes the distance between the concept vector of the query and of the concept vectors of the images in database. The system is evaluated with a COREL Stock Photo collection.
Exploring Multimedia in a Keyword Space
"... We address the problem of searching multimedia by semantic similarity in a keyword space. In contrast to previous research we represent multimedia content by a vector of keywords instead of a vector of low-level features. This vector of keywords can be obtained through user manual annotations or com ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We address the problem of searching multimedia by semantic similarity in a keyword space. In contrast to previous research we represent multimedia content by a vector of keywords instead of a vector of low-level features. This vector of keywords can be obtained through user manual annotations or computed by an automatic annotation algorithm. In this setting, we studied the influence of two aspects of the search by semantic similarity process: (1) accuracy of user keywords versus automatic keywords and (2) functions to compute semantic similarity between keyword vectors of two multimedia documents. We
Compound document analysis by fusing evidence across media
- In Proceedings of the 7th International Workshop on Content-Based Multimedia Indexing (CBMI 2009
, 2009
"... In this paper a cross media analysis scheme for the semantic interpretation of compound documents is presented. It is essentially a late-fusion mechanism that operates on top of single-media extractors output and it’s main novelty relies on using the evidence extracted from heterogeneous media sourc ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper a cross media analysis scheme for the semantic interpretation of compound documents is presented. It is essentially a late-fusion mechanism that operates on top of single-media extractors output and it’s main novelty relies on using the evidence extracted from heterogeneous media sources to perform probabilistic inference on a bayesian network that incorporates knowledge about the domain. Experiments performed on a set of 54 compound documents showed that the proposed scheme is able to exploit the existing cross media relations and achieve performance improvements. 1
Algorithms, Design, Human-Factors
"... In large organizations the resources needed to solve challenging problems are typically dispersed over systems within and beyond the organization, and also in different media. However, there is still the need, in knowledge environments, for extraction methods able to combine evidence for a fact from ..."
Abstract
- Add to MetaCart
In large organizations the resources needed to solve challenging problems are typically dispersed over systems within and beyond the organization, and also in different media. However, there is still the need, in knowledge environments, for extraction methods able to combine evidence for a fact from across different media. In many cases the whole is more than the sum of its parts: only when considering the different media simultaneously can enough evidence be obtained to derive facts otherwise inaccessible to the knowledge worker via traditional methods that work on each single medium separately. In this paper, we present a cross-media knowledge extraction framework specifically designed to handle large volumes of documents composed of three types of media – text, images and raw data – and to exploit the evidence across the media. Our goal is to improve the quality and depth of automatically extracted knowledge.
A cross media approach for compound document analysis
"... A cross media analysis scheme for the semantic interpretation of compound documents is presented. The proposed scheme is essentially a late-fusion mechanism that operates on top of single-media extractors output. Evidence extracted from heterogeneous sources are used to trigger probabilistic inferen ..."
Abstract
- Add to MetaCart
A cross media analysis scheme for the semantic interpretation of compound documents is presented. The proposed scheme is essentially a late-fusion mechanism that operates on top of single-media extractors output. Evidence extracted from heterogeneous sources are used to trigger probabilistic inference on a bayesian network that encodes domain knowledge and quantifies causality. Experiments performed on a set of 54 compound documents showed that the proposed scheme is able to exploit the existing cross media relations and achieve performance improvements. 1
Web News Categorization using a Cross-Media Document Graph
"... In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Microsoft Office document). For example, a Web news page is composed by text describing some event (e.g., a car accident) and ..."
Abstract
- Add to MetaCart
In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Microsoft Office document). For example, a Web news page is composed by text describing some event (e.g., a car accident) and a picture containing additional information regarding the real extent of the event (e.g., how damaged the car is) or providing evidence corroborating the text part. The framework handles multimedia information by considering not only the document’s text and images data but also the layout structure which determines how a given text block is related to a particular image. The novelties and contributions of the proposed framework are: (1) support of heterogeneous types of multimedia documents; (2) a documentgraph representation method; and (3) the computation of crossmedia
Cross-Media Knowledge Extraction in the Car Manufacturing Industry
"... In this paper, we present a novel framework for machine learning-based cross-media knowledge extraction. The framework is specifically designed to handle documents composed of three types of media – text, images and raw data – and to exploit the evidence for an extracted fact from across the media. ..."
Abstract
- Add to MetaCart
In this paper, we present a novel framework for machine learning-based cross-media knowledge extraction. The framework is specifically designed to handle documents composed of three types of media – text, images and raw data – and to exploit the evidence for an extracted fact from across the media. We validate the framework by applying it in the design and development of cross-media extraction systems in the context of two real-world use cases in the car manufacturing industry. Moreover, we show that in these use cases the cross-media approach effectively improves system extraction accuracy. 1

