## Information-theoretic semantic multimedia indexing (2007)

Venue: | in ACM Conference on Image and Video Retrieval |

Citations: | 22 - 10 self |

### BibTeX

@INPROCEEDINGS{Magalhães07information-theoreticsemantic,

author = {João Magalhães and Stefan Rüger},

title = {Information-theoretic semantic multimedia indexing},

booktitle = {in ACM Conference on Image and Video Retrieval},

year = {2007}

}

### OpenURL

### Abstract

To solve the problem of indexing collections with diverse text documents, image documents, or documents with both text and images, one needs to develop a model that supports heterogeneous types of documents. In this paper, we show how information theory supplies us with the tools necessary to develop a unique model for text, image, and text/image retrieval. In our approach, for each possible query keyword we estimate a maximum entropy model based on exclusively continuous features that were preprocessed. The unique continuous feature-space of text and visual data is constructed by using a minimum description length criterion to find the optimal feature-space representation (optimal from an information theory point of view). We evaluate our approach in three experiments: only text retrieval, only image retrieval, and text combined with image retrieval.

### Citations

9087 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ... sparse and high-dimensional, visual data is usually dense and low-dimensional (note that adjectives high and low are used to contrast the different data that we are dealing with). Information theory =-=[9]-=- provides us with a set of information measures that not only assess the amount of information that one single source of data contains, but also the amount of information that two sources of data have... |

1936 |
An Algorithm for Suffix Stripping
- Porter
- 1980
(Show Context)
Citation Context ... information retrieval text processing techniques [39] we remove stop words and, following Joachims [19], remove rare words from the text corpus (to avoid overfitting). After this, the Porter stemmer =-=[31]-=- reduces words to their morphological root. The terms obtained by this process are weighted by their inverse document frequency [33], ( ) IDF( ti ) = log d DF( ti ) , (7) where d is the number of docu... |

1853 | Text categorization with support vector machines: Learning with many relevant features
- Joachims
- 1998
(Show Context)
Citation Context ...ly term-weighting. Due to the high-dimensional feature space of text data most text categorization algorithms are linear models such as naïve Bayes [26], maximum entropy [28], Support Vector Machines =-=[19]-=-, regularized linear models [44], and Linear Least Squares Fit [40]. Joachims [19] applies SVMs directly to the text terms. Text is ideal for applying SVMs without the need of a kernel function becaus... |

1666 | Term weighting approaches in automatic text retrieval
- Salton, Buckley
- 1987
(Show Context)
Citation Context ... text corpus (to avoid overfitting). After this, the Porter stemmer [31] reduces words to their morphological root. The terms obtained by this process are weighted by their inverse document frequency =-=[33]-=-, ( ) IDF( ti ) = log d DF( ti ) , (7) where d is the number of documents in the collection and DF( t i ) is the number of documents containing the term t i . Text features are high-dimensional sparse... |

1234 |
Modelling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...n this section we answer questions like “how many text features?” and “how many visual clusters?” that are usually addressed by some heuristic method. We employ a minimum description length criterion =-=[32]-=-, to infer the optimal representation of each feature space as follows. When changing the representation of the data we compute a * candidate transformation F that carries an expected error of the dat... |

1139 | A maximum entropy approach to natural language processing
- Berger, Pietra, et al.
- 1996
(Show Context)
Citation Context ...ion of hierarchical models. 4. MAXIMUM ENTROPY MODEL Maximum entropy modelling is a statistical learning technique that has been applied to a great variety of fields, e.g. natural language processing =-=[5]-=-, text classification [28], image annotation [18]. Maximum entropy is used in this paper to model query keywords in the optimal feature space that was discussed in the previous section. As is shown in... |

1037 | A comparative study on feature selection in text categorization
- YANG, PEDERSEN
- 1997
(Show Context)
Citation Context ... tj i j For each term t j , the criterion measures the common entropy between a given query keyword entropy H( w i ) and the query keyword entropy given a term t j , H( wi | t j ) . Yang and Pedersen =-=[42]-=- and Gorman [13] have shown experimentally that this is one of the best criteria for feature selection. 3.2.2 Feature Space Selection With the terms ranked by their amount of entropy shared with the q... |

863 | Probabilistic latent semantic indexing
- Hofmann
- 1999
(Show Context)
Citation Context ...h-dimensional sparse feature spaces. The proposed maximum entropy framework tackles this problem by expanding the feature space in a similar spirit to Hoffman’s probabilistic Latent Semantic Indexing =-=[15]-=-. These single-modality based approaches are far from our initial goal but by analysing them we can see which family of models can be used to simultaneously model text, image, and multi-modal content.... |

822 | A comparison of event models for naive bayes text classification
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ...moving stopwords and rare words, stemming, and finally term-weighting. Due to the high-dimensional feature space of text data most text categorization algorithms are linear models such as naïve Bayes =-=[26]-=-, maximum entropy [28], Support Vector Machines [19], regularized linear models [44], and Linear Least Squares Fit [40]. Joachims [19] applies SVMs directly to the text terms. Text is ideal for applyi... |

768 |
The elements of statistical learning: data mining, inference, and prediction
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...ances on different text collections. Their results indicate that k-Nearest Neighbour, SVMs, and LLSF are the best classifiers. Note that nearest neighbour approaches have certain characteristics (see =-=[14]-=-) that make them computationally too complex to handle large-scale indexing. The simplest image annotation models deploy a traditional multiclass supervised learning model and learn the class-conditio... |

698 | A re-examination of text categorization methods
- Yang, Liu
- 1999
(Show Context)
Citation Context ...l advantage that while these approaches have no automatic mechanism to select a vocabulary size we use the minimum description length principle to select its optimal size. Yang [39], and Yang and Liu =-=[41]-=- have compared a number of text classification algorithms and reported their performances on different text collections. Their results indicate that k-Nearest Neighbour, SVMs, and LLSF are the best cl... |

573 | Inducing features of random fields
- Pietra, Stephen, et al.
- 1997
(Show Context)
Citation Context ... text classification [28], image annotation [18]. Maximum entropy is used in this paper to model query keywords in the optimal feature space that was discussed in the previous section. As is shown in =-=[30]-=- maximum entropy models have an exponential (or log-linear) form 1 βwt ⋅F ( TV , ) P( wt| T, V ) = e , (14) Z ( TV , ) where F ( TV , ) is the feature vector, β w is the weight vector t for keyword w ... |

537 | An evaluation of statistical approaches to text categorization
- Yang
- 1999
(Show Context)
Citation Context ...quares) with the crucial advantage that while these approaches have no automatic mechanism to select a vocabulary size we use the minimum description length principle to select its optimal size. Yang =-=[39]-=-, and Yang and Liu [41] have compared a number of text classification algorithms and reported their performances on different text collections. Their results indicate that k-Nearest Neighbour, SVMs, a... |

478 | Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. European Conf. for Computer Vision
- Duygulu, Barnard, et al.
- 2002
(Show Context)
Citation Context ...om such sparse data. Other types of approaches are based on a translation model between keywords and images (global, tiles or regions). Inspired by automatic text translation research, Duygulu et al. =-=[10]-=- developed a method of annotating images with words. First, regions are created using a segmentation algorithm like normalised cuts. For each region, features are computed and then blobs are generated... |

343 | Modeling annotated data
- Blei, Jordan
- 2003
(Show Context)
Citation Context ...inspired by a hierarchical clustering/aspect model. The data are assumed to be generated by a fixed hierarchy of nodes with the leaves of the hierarchy corresponding to soft clusters. Blei and Jordan =-=[6]-=- propose the correspondence latent Dirichlet allocation model; a Bayesian model for capturing the relations between regions, words and latent variables. The exploitation of hierarchical structures (ei... |

340 | Automatic Image Annotation and Retrieval using Cross-Media Relevance Models
- Jeon, Lavrenko, et al.
- 2003
(Show Context)
Citation Context ...ions across an image collection. The problem is then formulated as learning the correspondence between the discrete vocabulary of blobs and the image keywords. Following the same translation approach =-=[11, 17, 20]-=- have developed a series of translation models that use different models for keywords (multinomial/binomial) and images representations (hard clustered regions, soft clustered regions, tiles). Hierarc... |

308 | An extensive empirical study of feature selection metrics for text classification
- Forman
(Show Context)
Citation Context ... term t j , the criterion measures the common entropy between a given query keyword entropy H( w i ) and the query keyword entropy given a term t j , H( wi | t j ) . Yang and Pedersen [42] and Gorman =-=[13]-=- have shown experimentally that this is one of the best criteria for feature selection. 3.2.2 Feature Space Selection With the terms ranked by their amount of entropy shared with the query keywords, w... |

298 | Unsupervised Learning of Finite Mixture Models
- Figueiredo, Jain
- 2002
(Show Context)
Citation Context ...rent model complexities) we estimate a hierarchal set of density models (GMMs). We developed a C++ implementation of the modified expectation-maximization algorithm proposed by Figueiredo and Jain in =-=[12]-=-. With minor modifications this algorithm responds to our needs, see [23]. It starts with a number of clusters much larger than the true number of clusters and deletes clusters as they get little supp... |

275 | Using maximum entropy for text classification
- Nigam, Lafferty, et al.
- 1999
(Show Context)
Citation Context ...are words, stemming, and finally term-weighting. Due to the high-dimensional feature space of text data most text categorization algorithms are linear models such as naïve Bayes [26], maximum entropy =-=[28]-=-, Support Vector Machines [19], regularized linear models [44], and Linear Least Squares Fit [40]. Joachims [19] applies SVMs directly to the text terms. Text is ideal for applying SVMs without the ne... |

237 | A comparison of algorithms for maximum entropy parameter estimation
- Malouf
- 2002
(Show Context)
Citation Context ...ature information from earlier iterations, which is less likely to be relevant to the actual behaviour of the Hessian at the current iteration, is discarded in the interest of saving storage”. Malouf =-=[25]-=- has compared several optimisation algorithms for maximum entropy and found the limited-memory BFGS algorithm to be the best one. We use the implementation provided by Liu and Nocedal [22]. 5. EVALUAT... |

234 | Learning the semantics of words and pictures
- Barnard, Forsyth
- 2001
(Show Context)
Citation Context ...ds (multinomial/binomial) and images representations (hard clustered regions, soft clustered regions, tiles). Hierarchical models have also been used in image annotation such as Barnard and Forsyth’s =-=[3]-=- generative hierarchical aspect model inspired by a hierarchical clustering/aspect model. The data are assumed to be generated by a fixed hierarchy of nodes with the leaves of the hierarchy correspond... |

232 | A gaussian prior for smoothing maximum entropy models
- Chen, Rosenfeld
- 1999
(Show Context)
Citation Context ... visual features, text features from different sources (ASR, OCR), and audio features. Three types of classifiers are available: logistic regression (which without regularization is known to over-fit =-=[8]-=-), Fisher linear discriminant, and SVMs (offering the best accuracy). The fusion of the different modalities is possible to be done at different levels and it is chosen by cross-validation for each co... |

212 |
Minimum complexity density estimation
- Barron, Cover
- 1991
(Show Context)
Citation Context ...the likelihood of the data D for the model Mθ and the model complexity. The MDL criterion is designed “to achieve the best compromise between likelihood and … complexity relative to the sample size”, =-=[4]-=-: it selects automatically the optimal feature space representation that can be obtained with an average mutual information measure. 3.3 Dense Feature Spaces: Visual Data We now describe the visual fe... |

198 |
Multiple Bernoulli Relevance Models for Image and Video Annotation
- Feng, Manmatha, et al.
- 2004
(Show Context)
Citation Context ...ions across an image collection. The problem is then formulated as learning the correspondence between the discrete vocabulary of blobs and the image keywords. Following the same translation approach =-=[11, 17, 20]-=- have developed a series of translation models that use different models for keywords (multinomial/binomial) and images representations (hard clustered regions, soft clustered regions, tiles). Hierarc... |

190 | A model for learning the semantics of pictures
- Lavrenko, Manmatha, et al.
- 2003
(Show Context)
Citation Context ...ions across an image collection. The problem is then formulated as learning the correspondence between the discrete vocabulary of blobs and the image keywords. Following the same translation approach =-=[11, 17, 20]-=- have developed a series of translation models that use different models for keywords (multinomial/binomial) and images representations (hard clustered regions, soft clustered regions, tiles). Hierarc... |

166 | Image classification for content-based indexing
- Vailaya, Figueiredo, et al.
- 2001
(Show Context)
Citation Context ... et al. [43] deployed a nonparametric distribution; Carneiro and Vasconcelos [7] a semi-parametric density estimation; Westerveld and de Vries [37] a finite-mixture of Gaussians; while Vailaya et al. =-=[36]-=- apply a vector quantization technique. Density based approaches are among the most successful ones. However, density distributions are not adequate for text because the density models do not get enou... |

124 |
An example-based mapping method for text categorization and retrieval
- Yang, Chute
- 1994
(Show Context)
Citation Context ...t data most text categorization algorithms are linear models such as naïve Bayes [26], maximum entropy [28], Support Vector Machines [19], regularized linear models [44], and Linear Least Squares Fit =-=[40]-=-. Joachims [19] applies SVMs directly to the text terms. Text is ideal for applying SVMs without the need of a kernel function because data is already sparse and high-dimensional. Linear models fitted... |

100 | A probabilistic framework for semantic video indexing, filtering, and retrieval
- Naphade, Huang
- 2001
(Show Context)
Citation Context ... these classifiers. Westerveld, et al. [38] combine the visual model and the text model under the assumption that they are independent,sthus the probabilities are simply multiplied. Naphade and Huang =-=[27]-=- model visual features with Gaussian Mixtures Models (GMM), audio features with Hidden Markov Models (HMM) and combine them in a Bayesian network. In multimedia documents the different modalities cont... |

86 | Text categorization based on regularized linear classification methods
- Zhang, Oles
- 2001
(Show Context)
Citation Context ...gh-dimensional feature space of text data most text categorization algorithms are linear models such as naïve Bayes [26], maximum entropy [28], Support Vector Machines [19], regularized linear models =-=[44]-=-, and Linear Least Squares Fit [40]. Joachims [19] applies SVMs directly to the text terms. Text is ideal for applying SVMs without the need of a kernel function because data is already sparse and hig... |

81 |
On the limited memory method for large scale optimization
- Liu, Nocedal
- 1989
(Show Context)
Citation Context ... a 10,000×10,000 on each iteration. Thus, algorithms that compute approximations to the Hessian matrix are ideal for the problem at hand. The limited-memory BFGS algorithm proposed by Liu and Nocedal =-=[22]-=- is one of such algorithms that “use curvature information from only the most recent iterations to construct the Hessian approximation. Curvature information from earlier iterations, which is less lik... |

68 | Formulating Semantic Image Annotation as a Supervised Learning Problem
- Carneiro, Vasconcelos
- 2005
(Show Context)
Citation Context ...veral techniques to model p( x| w ) with different types of probability density distributions have been proposed: Yavlinsky et al. [43] deployed a nonparametric distribution; Carneiro and Vasconcelos =-=[7]-=- a semi-parametric density estimation; Westerveld and de Vries [37] a finite-mixture of Gaussians; while Vailaya et al. [36] apply a vector quantization technique. Density based approaches are among t... |

64 | Automated image annotation using global features and robust nonparametric density estimation
- Yavlinsky, Schofield, et al.
- 2005
(Show Context)
Citation Context ...| w ) , the features data density distribution of a given keyword. Several techniques to model p( x| w ) with different types of probability density distributions have been proposed: Yavlinsky et al. =-=[43]-=- deployed a nonparametric distribution; Carneiro and Vasconcelos [7] a semi-parametric density estimation; Westerveld and de Vries [37] a finite-mixture of Gaussians; while Vailaya et al. [36] apply a... |

63 | Smeulders, “The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing
- Snoek, Worring, et al.
- 2006
(Show Context)
Citation Context ...en way because they represent the same reality. Synchronization/relation and the strategy to combine the multi-modal patterns is a key point of the Semantic pathfinder system proposed by Snoek et al. =-=[34, 35]-=-. Their system uses a unique feature vector that concatenates a rich set of visual features, text features from different sources (ASR, OCR), and audio features. Three types of classifiers are availab... |

57 | A maximum entropy framework for part-based texture and object recognition
- Lazebnik, Schmid, et al.
- 2005
(Show Context)
Citation Context ...ses the number of parameters (model complexity) to be estimated with the same amount of training data. Maximum entropy models have also been applied to image annotation [2, 18] and object recognition =-=[21]-=-. All these three approaches have specific features for each class (keywords in our case) which increases the complexity of the system. It is curious to note the large difference in precision results ... |

54 | Using maximum entropy for automatic image annotation, Lecture notes in computer science
- Jeon, Manmatha
- 2004
(Show Context)
Citation Context ...ta or of the parameters) increases the number of parameters (model complexity) to be estimated with the same amount of training data. Maximum entropy models have also been applied to image annotation =-=[2, 18]-=- and object recognition [21]. All these three approaches have specific features for each class (keywords in our case) which increases the complexity of the system. It is curious to note the large diff... |

48 | Evaluation of Texture Features for Content-Based Image Retrieval
- Howarth, Rueger
- 2004
(Show Context)
Citation Context ...ent the transformation to obtain the optimal feature space. The low-level features that we use in our implementation are a Marginal HSV colour feature [29] with 12 dimensions, a Gabor texture feature =-=[16]-=- with 16 dimensions, and a Tamura texture feature [16] with 3 dimensions. Images are segmented into 3 by 3 parts (9 tiles) before extracting the low-level features. Our visual feature spaces are dense... |

20 | The MediaMill TRECVID 2006 Semantic Video Search Engine
- Snoek, Gemert, et al.
- 2006
(Show Context)
Citation Context ...en way because they represent the same reality. Synchronization/relation and the strategy to combine the multi-modal patterns is a key point of the Semantic pathfinder system proposed by Snoek et al. =-=[34, 35]-=-. Their system uses a unique feature vector that concatenates a rich set of visual features, text features from different sources (ASR, OCR), and audio features. Three types of classifiers are availab... |

19 | IBM research TRECVID-2005 video retrieval system
- Amir, etc
- 2005
(Show Context)
Citation Context ...ed to compute the visual features and to iteratively select the best classifier, the best type of fusion, and the SVMs parameter optimization are serious drawbacks of this system. IBM’s Marvel system =-=[1]-=- has a similar architecture with different learning algorithms to analyse the semantics of multimedia content. These two approaches offer the best performance on the TRECVID2005 conference. Both appro... |

11 | Video retrieval using global features in keyframes
- Pickering, Heesch, et al.
- 2002
(Show Context)
Citation Context ...ization parameter was chosen by cross-validation. The graphs also compare the maximum entropy framework to a baseline naïve Bayes model. The low-level visual features are: Marginal HSV colour feature =-=[29]-=- with 12 dimensions; Gabor texture feature [16] with 16 dimensions; Tamura texture feature [16] with 3 dimensions. Images are segmented into 3 by 3 parts (9 tiles) before extracting the low-level feat... |

9 | Logistic regression of generic codebooks for semantic image retrieval
- Magalhães, J, et al.
- 2006
(Show Context)
Citation Context ...GMMs). We developed a C++ implementation of the modified expectation-maximization algorithm proposed by Figueiredo and Jain in [12]. With minor modifications this algorithm responds to our needs, see =-=[23]-=-. It starts with a number of clusters much larger than the true number of clusters and deletes clusters as they get little support data or when they become singularities. Once a model is fitted, the s... |

8 |
Semantic annotation of multimedia using maximum entropy models
- Argillander, Iyengar, et al.
- 2005
(Show Context)
Citation Context ...ta or of the parameters) increases the number of parameters (model complexity) to be estimated with the same amount of training data. Maximum entropy models have also been applied to image annotation =-=[2, 18]-=- and object recognition [21]. All these three approaches have specific features for each class (keywords in our case) which increases the complexity of the system. It is curious to note the large diff... |

6 | S.: High-dimensional visual vocabularies for image retrieval
- Magalhaes, Rueger
- 2007
(Show Context)
Citation Context ...e multinomial distribution, as the binomial distribution is too limiting given the probabilistic nature of our problem. The description of the naïve Bayes implementation used in our experiments is in =-=[24]-=-. 5.3 Experiments and Results We run retrieval experiments by ranking documents for each keyword and computing the corresponding average precision. The mean of the results for all keywords, the mean a... |

2 |
de Vries, "Experimental result analysis for a generative probabilistic image retrieval model
- Westerveld, P
- 2003
(Show Context)
Citation Context ...ility density distributions have been proposed: Yavlinsky et al. [43] deployed a nonparametric distribution; Carneiro and Vasconcelos [7] a semi-parametric density estimation; Westerveld and de Vries =-=[37]-=- a finite-mixture of Gaussians; while Vailaya et al. [36] apply a vector quantization technique. Density based approaches are among the most successful ones. However, density distributions are not ade... |

1 |
Combining information sources for video retrieval," TREC Video Retrieval Evaluation Workshop
- Westerveld, Vries, et al.
- 2003
(Show Context)
Citation Context ... information about each keyword of the vocabulary. The simplest approach to multi-modal analysis is to design a classifier per modality and combine the output of these classifiers. Westerveld, et al. =-=[38]-=- combine the visual model and the text model under the assumption that they are independent,sthus the probabilities are simply multiplied. Naphade and Huang [27] model visual features with Gaussian Mi... |