Results 1 - 10
of
14
Multimodal Video Indexing: A Review of the State-of-the-art
- Multimedia Tools and Applications
, 2003
"... Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video in ..."
Abstract
-
Cited by 103 (18 self)
- Add to MetaCart
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
On Image Auto-Annotation with Latent Space Models
- MM'03
, 2003
"... Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains un ..."
Abstract
-
Cited by 61 (9 self)
- Add to MetaCart
Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains unknown. In this paper, we apply and compare two simple latent space models commonly used in text analysis, namely Latent Semantic Analysis (LSA) and Probabilistic LSA (PLSA). Annotation strategies for each model are discussed. Remarkably, we found that, on a 8000-image dataset, a classic LSA model defined on keywords and a very basic image representation performed as well as much more complex, state-of-the-art methods. Furthermore, nonprobabilistic methods (LSA and direct image matching) outperformed PLSA on the same dataset.
A Probabilistic Multimedia Retrieval Model and Its Evaluation
- EURASIP Journal on Applied Signal Processing
, 2003
"... In this paper we present a probabilistic model for the retrieval of multimodal documents. The model is based on Bayesian decision theory and combines models for text based search with models for visual search. The textual model is based on the language modelling approach to text retrieval and the vi ..."
Abstract
-
Cited by 18 (11 self)
- Add to MetaCart
In this paper we present a probabilistic model for the retrieval of multimodal documents. The model is based on Bayesian decision theory and combines models for text based search with models for visual search. The textual model is based on the language modelling approach to text retrieval and the visual information is modelled as a mixture of Gaussian densities. Both models have been proved successful on various standard retrieval tasks. We evaluate the multimodal model on the search task of TREC's video track. We found that the disclosure of video material based on visual information only is still too di#cult. Even with purely visual information needs, text based retrieval still outperforms visual approaches. The probabilistic model is useful for text, visual and multimedia retrieval. Unfortunately, simplifying assumptions that reduce its computational complexity degrade retrieval e#ectiveness. Regarding the question whether the model can e#ectively combine information from di#erent modalities, we conclude that whenever both modalities yield reasonable scores, a combined run outperforms the individual runs.
RETIN: A content-based image indexing and retrieval system
, 2001
"... This paper presents RETIN, a new system for automatic image indexing and interactive content-based image retrieval. The most original aspect of our work rests on the distance computation and its adjustment by relevance feedback. First of all, during an off-line stage, the indexes are computed from a ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
This paper presents RETIN, a new system for automatic image indexing and interactive content-based image retrieval. The most original aspect of our work rests on the distance computation and its adjustment by relevance feedback. First of all, during an off-line stage, the indexes are computed from attribute vectors associated to image pixels. The feature spaces are partitioned through an un-supervised classification and then, thanks to these partitions, statistical distributions are processed for each image. During the on-line use of the system, the user makes an iconic request, i.e. he brings an example of the type of image he is looking for. The query may be global or partial, since the user can reduce his/her request to a region of interest. The comparison between the query distribution and that of every image in the collection, is carried out by using a weighted dissimilarity function which manages the use of several attributes. The results of the search are then refined by means of relevance feedback which tunes the weights of the dissimilarity metric via user interaction. Experiments are then performed on large databases and statistical quality assessment shows the good properties of RETIN for digital image retrieval. The evaluation also shows that relevance feedback brings flexibility and robustness to the search.
Probabilistic Multimedia Retrieval
, 2002
"... We present a framework in which probabilistic models for textual and visual information retrieval can be integrated seamlessly. The framework facilitates searching for imagery using textual descriptions and visual examples simultaneously. The underlying Language Models for text and Gaussian Mixture ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We present a framework in which probabilistic models for textual and visual information retrieval can be integrated seamlessly. The framework facilitates searching for imagery using textual descriptions and visual examples simultaneously. The underlying Language Models for text and Gaussian Mixture Models for images have proven successful in various retrieval tasks.
Cross-Language Image Retrieval via Spoken Query
- Proceedings of RIAO 2004: Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval
, 2004
"... This paper studies cross-language cross-medium information retrieval. We introduce several approaches to unify the languages and media of queries and documents. We experiment on cross-language image retrieval via spoken query. Two approaches are proposed to recognize and translate spoken queries. We ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper studies cross-language cross-medium information retrieval. We introduce several approaches to unify the languages and media of queries and documents. We experiment on cross-language image retrieval via spoken query. Two approaches are proposed to recognize and translate spoken queries. We also propose a similarity-based approach to identify and backward transliterate named entities in a spoken query. 1.
Multimodal fusion for multimedia analysis: a survey
, 2010
"... This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several c ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.
Feature selection for automatic image annotation
- In Proceedings of the 28th Pattern Recognition Symposium of the German Association for Pattern Recognition (DAGM 2006
, 2006
"... Abstract. Automatic image annotation empowers the user to search an image database using keywords, which is often a more practical option than a query-by-example approach. In this work, we present a novel image annotation scheme which is fast and effective and scales well to a large number of keywor ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Automatic image annotation empowers the user to search an image database using keywords, which is often a more practical option than a query-by-example approach. In this work, we present a novel image annotation scheme which is fast and effective and scales well to a large number of keywords. We first provide a feature weighting scheme suitable for image annotation, and then an annotation model based on the one-class support vector machine. We show that the system works well even with a small number of visual features. We perform experiments using the Corel Image Collection and compare the results with a wellestablished image annotation system. 1
Retrieving Images as Text
, 2003
"... Both interpretations of the title Retrieving Images as Text are considered in this thesis. We use text retrieval methods for content based image retrieval and we introduce a method for retrieving and combining images and text. A new method for image retrieval is introduced using Gaussian color scale ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Both interpretations of the title Retrieving Images as Text are considered in this thesis. We use text retrieval methods for content based image retrieval and we introduce a method for retrieving and combining images and text. A new method for image retrieval is introduced using Gaussian color scale space. It uses local features to describe small pieces of images where each piece is labeled. Such a label is denoted a 'word' of the image. LSI is applied on the words and yields good general results on a dataset for content based image retrieval and particularly on occluded images and camera rotated images, demonstrating the locality of the features. Images and text are combined by concatenating the words in an image to the textual words describing the image. When LSI is applied on this combined semantic space, relations are learned between text and text, image and image and text and image. Some examples demonstrate the usefulness of the method. Keywords: Information Retrieval, Content Based Image Retrieval, Digital Libraries, Multimodal Retrieval, Combining Images and Text, Image Semantics, Local Image Color Structure.

