Results 1 - 10
of
23
Supervised learning of semantic classes for image annotation and retrieval
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to- ..."
Abstract
-
Cited by 74 (10 self)
- Add to MetaCart
Abstract—A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning. Index Terms—Content-based image retrieval, semantic image annotation and retrieval, weakly supervised learning, multiple instance learning, Gaussian mixtures, expectation-maximization, image segmentation, object recognition. 1
Formulating semantic image annotation as a supervised learning problem
- IEEE CVPR
, 2005
"... We introduce a new method to automatically annotate and retrieve images using a vocabulary of image semantics. The novel contributions include a discriminant formulation of the problem, a multiple instance learning solution that enables the estimation of concept probability distributions without pri ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
We introduce a new method to automatically annotate and retrieve images using a vocabulary of image semantics. The novel contributions include a discriminant formulation of the problem, a multiple instance learning solution that enables the estimation of concept probability distributions without prior image segmentation, and a hierarchical description of the density of each image class that enables very efficient training. Compared to current methods of image annotation and retrieval, the one now proposed has significantly smaller time complexity and better recognition performance. Specifically, its recognition complexity is O(CxR), where C is the number of classes (or image annotations) and R is the number of image regions, while the best results in the literature have complexity O(TxR), where T is the number of training images. Since the number of classes grows substantially slower than that of training images, the proposed method scales better during training, and processes test images faster. This is illustrated through comparisons in terms of complexity, time, and recognition performance with current state-of-the-art methods. 1.
Bridging the gap: Query by semantic example
- IEEE TRANS. MULTIMEDIA
, 2007
"... A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinom ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
A combination of query-by-visual-example (QBVE) and semantic retrieval (SR), denoted as query-by-semantic-example (QBSE), is proposed. Images are labeled with respect to a vocabulary of visual concepts, as is usual in SR. Each image is then represented by a vector, referred to as a semantic multinomial, of posterior concept probabilities. Retrieval is based on the query-by-example paradigm: the user provides a query image, for which 1) a semantic multinomial is computed and 2) matched to those in the database. QBSE is shown to have two main properties of interest, one mostly practical and the other philosophical. From a practical standpoint, because it inherits the generalization ability of SR inside the space of known visual concepts (referred to as the semantic space) but performs much better outside of it, QBSE produces retrieval systems that are more accurate than what was previously possible. Philosophically, because it allows a direct comparison of visual and semantic representations under a common query paradigm, QBSE enables the design of experiments that explicitly test the value of semantic representations for image retrieval. An implementation of QBSE under the minimum probability of error (MPE) retrieval framework, previously applied with success to both QBVE and SR, is proposed, and used to demonstrate the two properties. In particular, an extensive objective comparison of QBSE with QBVE is presented, showing that the former significantly outperforms the latter both inside and outside the semantic space. By carefully controlling the structure of the semantic space, it is also shown that this improvement can only be attributed to the semantic nature of the representation on which QBSE is based.
A game-based approach for collecting semantic annotations of music
- In 8th International Conference on Music Information Retrieval (ISMIR
, 2007
"... Games based on human computation are a valuable tool for collecting semantic information about images. We show how to transfer this idea into the music domain in order to collect high-quality semantic information about songs. We present Listen Game, a online, multiplayer game that measures the seman ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Games based on human computation are a valuable tool for collecting semantic information about images. We show how to transfer this idea into the music domain in order to collect high-quality semantic information about songs. We present Listen Game, a online, multiplayer game that measures the semantic relationship between music and words. In the normal mode, a player sees a list of semantically related words (e.g., instruments, emotions, usages, genres) and is asked to pick the best and worst word to describe a song. In the freestyle mode, a user is asked to suggest a new word that describes the music. Each player receives realtime feedback about the agreement amongst all players. We show that we can use the data collected during a two-week pilot study of Listen Game to learn a supervised multiclass labeling (SML) model. We show that this SML model can annotate a novel song with meaningful words and retrieve relevant songs from a database of audio content. 1
Minimum Probability of Error Image Retrieval
- IEEE Trans. Signal Processing
"... Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower- and upper-boun ..."
Abstract
-
Cited by 19 (13 self)
- Add to MetaCart
Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower- and upper-bounded by functions of the Bayes and density estimation errors, and the impact of the components of the retrieval architecture (namely, the feature transformation and density estimation) on these bounds is characterized. This characterization suggests interpreting the search for the MPE feature set as the search for the minimum of the convex hull of a collection of curves of probability of error versus feature space dimension. A new algorithm for MPE feature design, based on a dictionary of empirical feature sets and the wrapper model for feature selection, is proposed. It is shown that, unlike traditional feature selection techniques, this algorithm scales to problems containing large numbers of classes. Experimental evaluation reveals that the MPE architecture is at least as good as popular empirical solutions on the narrow domains where these perform best but significantly outperforms them outside these domains. Index Terms—Bayesian methods, color and texture, expectation–maximization, feature selection, image retrieval, image similarity, minimum probability of error, mixture models, multiresolution, optimal retrieval systems, wrapper methods. I.
A database centric view of semantic image annotation and retrieval
- In Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in information Retrieval
, 2005
"... We introduce a new model for semantic annotation and retrieval from image databases. The new model is based on a probabilistic formulation that poses annotation and retrieval as classification problems, and produces solutions that are optimal in the minimum probability of error sense. It is also dat ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
We introduce a new model for semantic annotation and retrieval from image databases. The new model is based on a probabilistic formulation that poses annotation and retrieval as classification problems, and produces solutions that are optimal in the minimum probability of error sense. It is also database centric, by establishing a one-to-one mapping between semantic classes and the groups of database images that share the associated semantic labels. In this work we show that, under the database centric probabilistic model, optimal annotation and retrieval can be implemented with algorithms that are conceptually simple, computationally efficient, and do not require prior semantic segmentation of training images. Due to its simplicity, the annotation and retrieval architecture is also amenable to sophisticated parameter tuning, a property that is exploited to investigate the role of feature selection in the design of optimal annotation and retrieval systems. Finally, we demonstrate the benefits of simply establishing a one-to-one mapping between keywords and the states of the semantic classification problem over the more complex, and currently popular, joint modeling of keyword and visual feature distributions. The database centric probabilistic retrieval model is compared to existing semantic labeling and retrieval methods, and shown to achieve higher accuracy than the previously best published results, at a fraction of their computational cost.
Audio information retrieval using semantic similarity
- In IEEE ICASSP
, 2007
"... We improve upon query-by-example for content-based audio information retrieval by ranking items in a database based on semantic similarity, rather than acoustic similarity, to a query example. The retrieval system is based on semantic concept models that are learned from a training data set containi ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
We improve upon query-by-example for content-based audio information retrieval by ranking items in a database based on semantic similarity, rather than acoustic similarity, to a query example. The retrieval system is based on semantic concept models that are learned from a training data set containing both audio examples and their text captions. Using the concept models, the audio tracks are mapped into a semantic feature space, where each dimension indicates the strength of the semantic concept. Audio retrieval is then based on ranking the database tracks by their similarity to the query in the semantic space. We experiment with both semantic- and acousticbased retrieval systems on a sound effects database and show that the semantic-based system improves retrieval both quantitatively and qualitatively. Index Terms — computer audition, audio retrieval, semantic similarity 1.
Towards musical query-by-semantic-description using the CAL500 data set
- Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
, 2007
"... Query-by-semantic-description (QBSD) is a natural paradigm for retrieving content from large databases of music. A major impediment to the development of good QBSD systems for music information retrieval has been the lack of a cleanlylabeled, publicly-available, heterogeneous data set of songs and a ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Query-by-semantic-description (QBSD) is a natural paradigm for retrieving content from large databases of music. A major impediment to the development of good QBSD systems for music information retrieval has been the lack of a cleanlylabeled, publicly-available, heterogeneous data set of songs and associated annotations. We have collected the Computer Audition Lab 500-song (CAL500) data set by having humans listen to and annotate songs using a survey designed to capture ‘semantic associations ’ between music and words. We adapt the supervised multi-class labeling (SML) model, which has shown good performance on the task of image retrieval, and use the CAL500 data to learn a model for music retrieval. The model parameters are estimated using the weighted mixture hierarchies expectation-maximization algorithm which has been specifically designed to handle realvalued semantic association between words and songs, rather than binary class labels. The output of the SML model, a vector of class-conditional probabilities, can be interpreted as a semantic multinomial distribution over a vocabulary. By also representing a semantic query as a query multinomial distribution, we can quickly rank order the songs in a database based on the Kullback-Leibler divergence between the query multinomial and each song’s semantic multinomial. Qualitative and quantitative results demonstrate that our SML model can both annotate a novel song with meaningful words and retrieve relevant songs given a multi-word, text-based query.
Scene Classification with Low-dimensional Semantic Spaces and Weak Supervision
"... A novel approach to scene categorization is proposed. Similar to previous works of [11, 15, 3, 12], we introduce an intermediate space, based on a low dimensional semantic “theme ” image representation. However, instead of learning the themes in an unsupervised manner, they are learned with weak sup ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
A novel approach to scene categorization is proposed. Similar to previous works of [11, 15, 3, 12], we introduce an intermediate space, based on a low dimensional semantic “theme ” image representation. However, instead of learning the themes in an unsupervised manner, they are learned with weak supervision, from casual image annotations. Each theme induces a probability density on the space of low-level features, and images are represented as vectors of posterior theme probabilities. This enables an image to be associated with multiple themes, even when there are no multiple associations in the training labels. An implementation is presented and compared to various existing algorithms, on benchmark datasets. It is shown that the proposed low dimensional representation correlates well with human scene understanding, and is able to learn theme co-occurrences without explicit training. It is also shown to outperform unsupervised latent-space methods, with much smaller training complexity, and to achieve performance close to the state of the art methods, which rely on much higher-dimensional image representations. Finally a study of the effect of dimensionality on the classification performance is presented, indicating that the dimensionality of theme space grows sub-linearly with the number of scene categories. 1.
Combining audio content and social context for semantic music discovery
- Proc. 32nd ACM SIGIR
, 2009
"... When attempting to annotate music, it is important to consider both acoustic content and social context. This paper explores techniques for collecting and combining multiple sources of such information for the purpose of building a query-by-text music retrieval system. We consider two representation ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
When attempting to annotate music, it is important to consider both acoustic content and social context. This paper explores techniques for collecting and combining multiple sources of such information for the purpose of building a query-by-text music retrieval system. We consider two representations of the acoustic content (related to timbre and harmony) and two social sources (social tags and web documents). We then compare three algorithms that combine these information sources: calibrated score averaging (CSA), RankBoost, and kernel combination support vector machines (KC-SVM). We demonstrate empirically that each of these algorithms is superior to algorithms that use individual information sources. Categories and Subject Descriptors

