Results 1 - 10
of
55
The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing.
- University of Washington,
, 2006
"... ..."
(Show Context)
Learning query-class dependent weights in automatic video retrieval
- In Proceedings of the 12th annual ACM international conference on Multimedia
, 2004
"... Combining retrieval results from multiple modalities plays a crucial role for video retrieval systems, especially for automatic video retrieval systems without any user feedback and query expansion. However, most of current systems only utilize query independent combination or rely on explicit user ..."
Abstract
-
Cited by 70 (15 self)
- Add to MetaCart
(Show Context)
Combining retrieval results from multiple modalities plays a crucial role for video retrieval systems, especially for automatic video retrieval systems without any user feedback and query expansion. However, most of current systems only utilize query independent combination or rely on explicit user weighting. In this work, we propose using query-class dependent weights within a hierarchial mixture-of-expert framework to combine multiple retrieval results. We first classify each user query into one of the four predefined categories and then aggregate the retrieval results with query-class associated weights, which can be learned from the development data efficiently and generalized to the unseen queries easily. Our experimental results demonstrate that the performance with query-class dependent weights can considerably surpass that with the query independent weights.
Finding person x: Correlating names with visual appearances
- Dublin City University Ireland
, 2004
"... Abstract. People as news subjects carry rich semantics in broadcast news video and therefore finding a named person in the video is a major challenge for video retrieval. This task can be achieved by exploiting the multi-modal information in videos, including transcript, video structure, and visual ..."
Abstract
-
Cited by 44 (7 self)
- Add to MetaCart
(Show Context)
Abstract. People as news subjects carry rich semantics in broadcast news video and therefore finding a named person in the video is a major challenge for video retrieval. This task can be achieved by exploiting the multi-modal information in videos, including transcript, video structure, and visual features. We propose a comprehensive approach for finding specific persons in broadcast news videos by exploring various clues such as names occurred in the transcript, face information, anchor scenes, and most importantly, the timing pattern between names and people. Experiments on the TRECVID 2003 dataset show that our approach achieves high performance. 1
Successful approaches in the trec video retrieval evaluations
- In Proc. ACM Multimedia
, 2004
"... This paper reviews successful approaches in evaluations of video retrieval over the last three years. The task involves the search and retrieval of shots from MPEG digitized video recordings using a combination of automatic speech, image and video analysis and information retrieval technologies. The ..."
Abstract
-
Cited by 43 (8 self)
- Add to MetaCart
(Show Context)
This paper reviews successful approaches in evaluations of video retrieval over the last three years. The task involves the search and retrieval of shots from MPEG digitized video recordings using a combination of automatic speech, image and video analysis and information retrieval technologies. The search evaluations are grouped into interactive (with a human in the loop) and noninteractive (where the human merely enters the query into the system) submissions. Most non-interactive search approaches have relied extensively on text retrieval, and only recently have image-based features contributed reliably to improved search performance. Interactive approaches have substantially outperformed all non-interactive approaches, with most systems relying heavily on the user’s ability to refine queries and reject spurious answers. We will examine both the successful automatic search approaches and the user interface techniques that have enabled high performance video retrieval.
Mining associated text and images with dual-wing Harmoniums
- In Conference on Uncertainty in Artificial Intelligence
, 2005
"... We propose a multi-wing harmonium model for mining multimedia data that extends and improves on earlier models based on two-layer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
We propose a multi-wing harmonium model for mining multimedia data that extends and improves on earlier models based on two-layer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two-layer directed models such as LDA for similar tasks, but bears significant difference in inference/learning cost tradeoffs, latent topic representations, and topic mixing mechanisms. In particular, our model facilitates efficient inference and robust topic mixing, and potentially provides high flexibilities in modeling the latent topic spaces. A contrastive divergence and a variational algorithm are derived for learning. We specialized our model to a dual-wing harmonium for captioned images, incorporating a multivariate Poisson for word-counts and a multivariate Gaussian for color histogram. We present empirical results on the applications of this model to classification, retrieval and image annotation on news video collections, and we report an extensive comparison with various extant models. 1
A learned lexicon-driven paradigm for interactive video retrieval
- IEEE Trans. Multimedia
, 2007
"... Abstract—Effective video retrieval is the result of an interplay between interactive query selection, advanced visualization of results, and a goal-oriented human user. Traditional interactive video retrieval approaches emphasize paradigms, such as query-by-keyword and query-by-example, to aid the u ..."
Abstract
-
Cited by 36 (13 self)
- Add to MetaCart
(Show Context)
Abstract—Effective video retrieval is the result of an interplay between interactive query selection, advanced visualization of results, and a goal-oriented human user. Traditional interactive video retrieval approaches emphasize paradigms, such as query-by-keyword and query-by-example, to aid the user in the search for relevant footage. However, recent results in automatic indexing indicate that query-by-concept is becoming a viable resource for interactive retrieval also. We propose in this paper a new video retrieval paradigm. The core of the paradigm is formed by first detecting a large lexicon of semantic concepts. From there, we combine query-by-concept, query-by-example, query-by-keyword, and user interaction into the MediaMill semantic video search engine. To measure the impact of increasing lexicon size on interactive video retrieval performance, we performed two experiments against the 2004 and 2005 NIST TRECVID benchmarks, using lexicons containing 32 and 101 concepts, respectively. The results suggest that from all factors that play a role in interactive retrieval, a large lexicon of semantic concepts matters most. Indeed, by exploiting large lexicons, many video search questions are solvable without using query-by-keyword and query-by-example. In addition, we show that the lexicon-driven search engine outperforms all state-of-the-art video retrieval systems in both TRECVID 2004 and 2005. Index Terms—Benchmarking, concept learning, content analysis and indexing, interactive systems, multimedia information systems, video retrieval. I.
How many high-level concepts will fill the semantic gap in video retrieval
- in International Conference on Image and Video Retrieval (CIVR). 2007
, 2007
"... video retrieval? ..."
(Show Context)
A bootstrapping framework for annotating and retrieving www images
- In Proceedings of the 12th annual ACM International Conference on Multimedia (MM’04
, 2004
"... Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image co ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
(Show Context)
Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text. One major practical limitation of employing supervised machine learning approaches is that for effective learning, a large set of labeled training samples is needed. This is tedious and severely impedes the practical development of effective search techniques for WWW images, which are dynamic and fast-changing. As web-based images possess both intrinsic visual contents and text annotations, they provide a strong basis to bootstrap the learning process by adopting a co-training approach involving classifiers based on two orthogonal set of features – visual and text. The idea of cotraining is to start from a small set of labeled training samples, and successively annotate a larger set of unlabeled samples using the two orthogonal classifiers. We carry out experiments using a set of over 5,000 images acquired from the Web. We explore the use of different combinations of HTML text and visual representations. We find that our bootstrapping approach can achieve a performance comparable to that of the supervised learning approach with an F 1 measure of over 54%. At the same time, it offers the added advantage of requiring only a small initial set of training samples.
Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval
- In Ph.D Thesis
, 2006
"... In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval m ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval model as a principled framework to combine diverse knowledge sources. An efficient rank-based learning approach has been developed to explicitly model the ranking relations in the learning process. Under this retrieval framework, we overview and develop a number of state-of-the-art approaches for extracting ranking features from multimedia knowledge sources. To incorporate query information in the combination model, this thesis develops a number of query analysis models that can automatically discover mixing structure of the query space based on previous retrieval results. To adapt the combination function on a per query basis, this thesis also presents a probabilistic local context analysis(pLCA) model to automatically leverage additional retrieval sources to improve initial retrieval outputs. All the proposed approaches are evaluated on multimedia retrieval tasks with large-scale video collections as well as meta-search tasks with large-scale text collections. 1
Mining relationship between video concepts using probabilistic graphical model
- in IEEE Int. Conf. Multimedia & Expo
, 2006
"... For large scale automatic semantic video characterization, it is necessary to learn and model a large number of semantic concepts. These semantic concepts do not exist in isolation to each other and exploiting this relationship between multiple video concepts could be a useful source to improve the ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
(Show Context)
For large scale automatic semantic video characterization, it is necessary to learn and model a large number of semantic concepts. These semantic concepts do not exist in isolation to each other and exploiting this relationship between multiple video concepts could be a useful source to improve the concept detection accuracy. In this paper, we describe various multi-concept relational learning approaches via a unified probabilistic graphical model representation and propose using numerous graphical models to mine the relationship between video concepts that have not been applied before. Their performances in video semantic concept detection are evaluated and compared on two TRECVID’05 video collections. 1.