Results 1 - 10
of
16
Neural Network-Based Face Detection
- IEEE Transactions On Pattern Analysis and Machine intelligence
, 1998
"... Abstract—We present a neural network-based upright frontal face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We ..."
Abstract
-
Cited by 764 (23 self)
- Add to MetaCart
Abstract—We present a neural network-based upright frontal face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We present a straightforward procedure for aligning positive face examples for training. To collect negative examples, we use a bootstrap algorithm, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting nonface training examples, which must be chosen to span the entire space of nonface images. Simple heuristics, such as using the fact that faces rarely overlap in images, can further improve the accuracy. Comparisons with several other state-of-the-art face detection systems are presented, showing that our system has comparable performance in terms of detection and false-positive rates. Index Terms—Face detection, pattern recognition, computer vision, artificial neural networks, machine learning.
Name-It: Naming and Detecting Faces in News Videos
, 1999
"... ions. (In the near future, the worldwide trend will be for broadcasts to feature closed captions.) Thus we use closed-caption texts as transcripts for news videos. In addition, we employ video-caption detection and recognition. We used "CNN Headline News" as our primary source of news for our experi ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
ions. (In the near future, the worldwide trend will be for broadcasts to feature closed captions.) Thus we use closed-caption texts as transcripts for news videos. In addition, we employ video-caption detection and recognition. We used "CNN Headline News" as our primary source of news for our experiments. Given image sequences, transcripts, and video captions as information sources, Name-It associates extracted faces with extracted name candidates using the correlation of their timing information and face similarity information. Video captions are also taken into account as supplementary information. To associate faces and names, Name-It integrates several advanced image processing and natural-language processing techniques ---face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition. Although these technologies aren't always highly accurate, integrating these results will help the system achieve more accurate output
ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing
- IEEE Trans. on Multimedia
, 2004
"... Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1 ) Semantics---sensitive video classification problem because of the semantic gap between low-le ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1 ) Semantics---sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.
Concept-Oriented Indexing of Video Databases: Toward Semantic Sensitive Retrieval and Browsing
- IEEE TRANS. ON IMAGE PROCESSING
, 2004
"... Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several content-based video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several content-based video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video concept modeling, semantic video classification, and concept-oriented video database indexing and access. In this paper, we propose a novel framework to make some advances toward the final goal to solve these problems. Specifically, the framework includes: 1) a semantic-sensitive video content representation framework by using principal video shots to enhance the quality of features; 2) semantic video concept interpretation by using flexible mixture model to bridge the semantic gap; 3) a novel semantic video-classifier training framework by integrating feature selection, parameter estimation, and model selection seamlessly in a single algorithm; and 4) a concept-oriented video database organization technique through a certain domain-dependent concept hierarchy to enable semantic-sensitive video retrieval and browsing.
MultiView: Multilevel video content representation and retrieval
, 2001
"... In this article, several practical algorithms are proposed to support content-based video analysis, modeling, representation, summarization, indexing, and access. First, a multilevel video database model is given. One advantage of this model is that it provides a reasonable approach to bridging the ..."
Abstract
-
Cited by 15 (12 self)
- Add to MetaCart
In this article, several practical algorithms are proposed to support content-based video analysis, modeling, representation, summarization, indexing, and access. First, a multilevel video database model is given. One advantage of this model is that it provides a reasonable approach to bridging the gap between low-level representative features and high-level semantic concepts from a human point of view. Second, several model-based video analysis techniques are proposed. In order to detect the video shots, we present a novel technique, which can adapt the threshold for scene cut detection to the activities of variant videos or even different video shots. A seeded region aggregation and temporal tracking technique is proposed for generating the semantic video objects. The semantic video scenes can then be generated from these extracted video access units (e.g., shots and objects) according to some domain knowledge. Third, in order to categorize video contents into a set of semantic clusters, an integrated video classification technique is developed to support more efficient multilevel video representation, summarization, indexing, and access techniques. 2001 SPIE and IS&T. [DOI: 10.1117/1.1406944] 1
Informedia - Search and Summarization in the Video Medium
, 2000
"... The Informedia system provides "full-content" search and retrieval of current and past TV and radio news and documentary broadcasts. The system implements a fully automatic intelligent process to enable daily content capture, analysis and storage in on-line archives. The current library consists of ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
The Informedia system provides "full-content" search and retrieval of current and past TV and radio news and documentary broadcasts. The system implements a fully automatic intelligent process to enable daily content capture, analysis and storage in on-line archives. The current library consists of approximately a 2,000 hours, 1.5 terabyte library of daily CNN News captured over the last 3 years and documentaries from public television and government agencies. This database allows for rapid retrieval of individual "video paragraphs" which satisfy an arbitrary spoken or typed subject area query based on a combination of the words in the soundtrack, images recognized in the video, plus closed-captioning when available and informational text overlaid on the screen images. There are also capabilities for matching of similar faces and images, generation of related map-based displays. The latest work attempts to produce a visualization and summarization of the content across all the stories ...
A Wearable Digital Library of Personal Conversations
- PROCEEDINGS OF THE JOINT CONFERENCE ON DIGITAL LIBRARIES
, 2002
"... We have developed a wearable, personalized digital library system, which unobtrusively records the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can rep ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We have developed a wearable, personalized digital library system, which unobtrusively records the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can replay the audio from the last conversation in compressed form summarizing the names and major issues mentioned. Experiments with a prototype system show that a combination of face recognition and speaker identification can be effective for retrieving conversations.
Video retrieval using speech and image information
- IN STORAGE AND RETRIEVAL FOR MULTIMEDIA DATABASES 2003, EI’03 ELECTRONIC IMAGING
, 2003
"... Video contains multiple types of audio and visual information, which are difficult to extract, combine or trade-off in general video information retrieval. This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection. A number of ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Video contains multiple types of audio and visual information, which are difficult to extract, combine or trade-off in general video information retrieval. This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection. A number of different sources of information are present in most typical broadcast video collections and can be exploited for information retrieval. We will discuss the contributions of automatically recognized speech transcripts, image similarity matching, face detection and video OCR in the contexts of experiments performed as part of 2001 TREC Video Retrieval Track evaluation performed by the National Institute of Standards and Technology. For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.
MPEG-7 based description schemes for multi-level video content classification
, 2004
"... MPEG-7 has emerged as the standard for multimedia data content description for efficiently describing multimedia content. In this context, its primary goal is to provide flexible and effective searching and retrieval of multimedia resources. Most of the earlier work on MPEG-7 description schemes (DS ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
MPEG-7 has emerged as the standard for multimedia data content description for efficiently describing multimedia content. In this context, its primary goal is to provide flexible and effective searching and retrieval of multimedia resources. Most of the earlier work on MPEG-7 description schemes (DSs) and descriptors (Ds) focuses on the description of a single multimedia document, whereas MPEG-7 can be further exploited to support more advances implementations under multimedia database systems. Therefore, it is important to reconsider issues related to high level multimedia modeling and representation, in the light of the MPEG-7 perspective. In this paper, we propose a high level multimedia representation and description scheme based on multi-level video modeling and semantic video classification. The proposed multi-level multimedia representation and DSs are expected to support more effective video content indexing and accessing operations. The presented DSs and Ds are further described by using the XML Schema language, which has been adopted as the basis of the Description Definition Language (DDL) of the MPEG-7 standard.
Video Retrieval with the Informedia Digital Video Library System
- In: Proceedings of the Tenth Text Retrieval Conference (TREC'01
, 2001
"... information extraction from video and audio content. Over a terabyte of online data was collected, with automatically generated metadata and indices for retrieving videos from this library. The architecture for the project was based on the premise that real-time constraints on library and associated ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
information extraction from video and audio content. Over a terabyte of online data was collected, with automatically generated metadata and indices for retrieving videos from this library. The architecture for the project was based on the premise that real-time constraints on library and associated metadata creation could be relaxed in order to realize increased automation and deeper parsing and indexing for identifying the library contents and breaking it into segments. Library creation was an offline activity, with library exploration by users occurring online and making use of the generated metadata and segmentation. The goal of the Informedia interface was to enable quick access to relevant information in a digital video library, leveraging from derived metadata and the partitioning of the video into small segments. Figure 1 shows the IDVLS interface following a query. In this figure, a set of results is displayed at the bottom. The display includes a window containing a headline, and a pictorial menu of video segments each represented with a thumbnail image at approximately resolution of the video in the horizontal and vertical dimensions. The headline window automatically pops up whenever the mouse is positioned over a result item; the headline window for the first result is shown. IDVLS also supports other ways of navigating and browsing the digital video library. These interface features were essential to deal with the ambiguity of the derived data generated by speech recognition,

