Results 1 - 10
of
99
Neural Network-Based Face Detection
- IEEE Transactions On Pattern Analysis and Machine intelligence
, 1998
"... Abstract—We present a neural network-based upright frontal face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We ..."
Abstract
-
Cited by 764 (23 self)
- Add to MetaCart
Abstract—We present a neural network-based upright frontal face detection system. A retinally connected neural network examines small windows of an image and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We present a straightforward procedure for aligning positive face examples for training. To collect negative examples, we use a bootstrap algorithm, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting nonface training examples, which must be chosen to span the entire space of nonface images. Simple heuristics, such as using the fact that faces rarely overlap in images, can further improve the accuracy. Comparisons with several other state-of-the-art face detection systems are presented, showing that our system has comparable performance in terms of detection and false-positive rates. Index Terms—Face detection, pattern recognition, computer vision, artificial neural networks, machine learning.
The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries
, 1997
"... Recently, similarity queries on feature vectors have been widely used to perform content-based retrieval of images. To apply this technique to large databases, it is required to develop multidimensional index structures supporting nearest neighbor queries e ciently. The SS-tree had been proposed for ..."
Abstract
-
Cited by 342 (3 self)
- Add to MetaCart
Recently, similarity queries on feature vectors have been widely used to perform content-based retrieval of images. To apply this technique to large databases, it is required to develop multidimensional index structures supporting nearest neighbor queries e ciently. The SS-tree had been proposed for this purpose and is known to outperform other index structures such as the R*-tree and the K-D-B-tree. One of its most important features is that it employs bounding spheres rather than bounding rectangles for the shape of regions. However, we demonstrate in this paper that bounding spheres occupy much larger volume than bounding rectangles with high-dimensional data and that this reduces search efficiency. To overcome this drawback, we propose a new index structure called the SR-tree (Sphere/Rectangle-tree) which integrates bounding spheres and bounding rectangles. A region of the SR-tree is specified by the intersection of a bounding sphere and a bounding rectangle. Incorporating bounding rectangles permits neighborhoods to be partitioned into smaller regions than the SS-tree and improves the disjointness among regions. This enhances the performance on nearest neighbor queries especially for highdimensional and non-uniform data which can be practical in actual image/video similarity indexing. We include the performance test results that verify this advantage of the SR-tree and show that the SR-tree outperforms both the SS-tree and the R*-tree.
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation
, 2000
"... We propose a novel index structure, A-tree (Approximation tree), for similarity search of high-dimensional data. The basic idea of the A-tree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
We propose a novel index structure, A-tree (Approximation tree), for similarity search of high-dimensional data. The basic idea of the A-tree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and thus affect the tree configuration both quantitatively and qualitatively. Firstly, since tree nodes can install large number of entries of VBRs, fanout of nodes becomes large, thus leads to fast search. More importantly, we have a free hand in arranging MBRs and VBRs in tree nodes. In the A-trees, nodes contain entries of an MBR and its children VBRs. Therefore, by fetching a node of an A-tree, we can obtain the information of exact position of a parent MBR and approximate position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the A-tree outperforms the SR-tree and the VA-File in all range of dimensionality up to 64 dimension, which is the highest dimension in our experiments. The A-tree achieves 77.3 % (77.7%, resp.) savings in page accesses compared to the SR-tree (the VA-File, resp.) for 64-dimensional real data.
Evolving Video Skims into Useful Multimedia Abstractions
, 1998
"... This paper reports two studies that measured the effects of different “video skim ” techniques on comprehension, navigation, and user satisfaction. Video skims are compact, content-rich abstractions of longer videos, condensations that preserve frame rate while greatly reducing viewing time. Their c ..."
Abstract
-
Cited by 80 (13 self)
- Add to MetaCart
This paper reports two studies that measured the effects of different “video skim ” techniques on comprehension, navigation, and user satisfaction. Video skims are compact, content-rich abstractions of longer videos, condensations that preserve frame rate while greatly reducing viewing time. Their characteristics depend on the image- and audio-processing techniques used to create them. Results from the initial study helped refine video skims, which were then reassessed in the second experiment. Significant benefits were found for skims built from audio sequences meeting certain criteria.
Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval
- INTELLIGENT MULTIMEDIA INFORMATION RETRIEVAL
, 1997
"... In theory, speech recognition technology can make any spoken words in video or audio media subject to text indexing, search and retrieval. This article describes the News-on-Demand application created within the Informedia Digital Video Library project and discusses how speech recognition is used f ..."
Abstract
-
Cited by 75 (6 self)
- Add to MetaCart
In theory, speech recognition technology can make any spoken words in video or audio media subject to text indexing, search and retrieval. This article describes the News-on-Demand application created within the Informedia Digital Video Library project and discusses how speech recognition is used for transcript creation from video, time alignment of closed-captioned transcripts, a speech query interface, and audio paragraph segmentation. Our results show that speech recognition accuracy varies dramatically depending on the quality and type of data used, but the system is quite useable with only moderate speech recognition accuracy.
Video OCR for Digital News Archives
- In Proc. Workshop on Content-Based Access of Image and Video Databases. (Los Alamitos, CA
, 1998
"... Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented -- the names of people and ..."
Abstract
-
Cited by 73 (0 self)
- Add to MetaCart
Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented -- the names of people and places or descriptions of objects. In this paper, two difficult problems of character recognition for videos are addressed: low resolution characters and extremely complex backgrounds. We apply an interpolation filter, multi-frame integration and a combination of four filters to solve these problems. Segmenting characters is done by a recognition-based segmentation method and intermediate character recognition results are used to improve the segmentation. The overall recognition results are good enough for use in news indexing. Performing Video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video conten...
Multimedia Content Analysis Using Both Audio and Visual Cues
, 2000
"... : Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part shoul ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
: Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part should be left for filler clips. The special event clips to be included are chosen uniformly and randomly from different types of events. The selection of a short clip from a scene is subject to some additional criteria, such as the amount of action and the similarity to the overall color composition of the movie. Closeness to the desired AV characteristics of certain scene types are also considered. The filler clips are chosen so that they do not overlap with the content of chosen special event clips, to ensure a good coverage of all parts of a movie. MPEG-7 Standard for Multimedia Content Description Interface MPEG-7 is an on-going standardization effort for content description of AV documen...
Name-It: Naming and Detecting Faces in News Videos
, 1999
"... ions. (In the near future, the worldwide trend will be for broadcasts to feature closed captions.) Thus we use closed-caption texts as transcripts for news videos. In addition, we employ video-caption detection and recognition. We used "CNN Headline News" as our primary source of news for our experi ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
ions. (In the near future, the worldwide trend will be for broadcasts to feature closed captions.) Thus we use closed-caption texts as transcripts for news videos. In addition, we employ video-caption detection and recognition. We used "CNN Headline News" as our primary source of news for our experiments. Given image sequences, transcripts, and video captions as information sources, Name-It associates extracted faces with extracted name candidates using the correlation of their timing information and face similarity information. Video captions are also taken into account as supplementary information. To associate faces and names, Name-It integrates several advanced image processing and natural-language processing techniques ---face sequence extraction and similarity evaluation from videos, name extraction from transcripts, and video-caption recognition. Although these technologies aren't always highly accurate, integrating these results will help the system achieve more accurate output
Slim-trees: High performance metric trees minimizing overlap between nodes
- In 7th International Conference on Extending Database Technology (EDBT
, 2000
"... Abstract. In this paper we present the Slim-tree, a dynamic tree for organizing metric datasets in pages of fixed size. The Slim-tree uses the “fat-factor ” which provides a simple way to quantify the degree of overlap between the nodes in a metric tree. It is well-known that the degree of overlap d ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
Abstract. In this paper we present the Slim-tree, a dynamic tree for organizing metric datasets in pages of fixed size. The Slim-tree uses the “fat-factor ” which provides a simple way to quantify the degree of overlap between the nodes in a metric tree. It is well-known that the degree of overlap directly affects the query performance of index structures. There are many suggestions to reduce overlap in multidimensional index structures, but the Slim-tree is the first metric structure explicitly designed to reduce the degree of overlap. Moreover, we present new algorithms for inserting objects and splitting nodes. The new insertion algorithm leads to a tree with high storage utilization and improved query performance, whereas the new split algorithm runs considerably faster than previous ones, generally without sacrificing search performance. Results obtained from experiments with real-world data sets show that the new algorithms of the Slim-tree consistently lead to performance improvements. After
Temporal Video Segmentation Using Unsupervised Clustering and Semantic Object Tracking
- Journal of Electronic Imaging
, 1998
"... This paper proposes a content-based temporal video segmentation system that integrates syntactic (domain-independent) and semantic (domain-dependent) features for automatic management of video data. Temporal video segmentation includes scene change detection and shot classification. The proposed sc ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
This paper proposes a content-based temporal video segmentation system that integrates syntactic (domain-independent) and semantic (domain-dependent) features for automatic management of video data. Temporal video segmentation includes scene change detection and shot classification. The proposed scene change detection method consists of two steps: detection and tracking of semantic objects of interest specified by the user, and an unsupervised method for detection of cuts, and edit effects. Object detection and tracking is achieved using a region matching scheme, where the region of interest is defined by the boundary of the object. A new unsupervised scene change detection method based on 2-class clustering is introduced to eliminate the data dependency of threshold selection. The proposed shot classification approach relies on semantic image features and exploits domain-dependent visual properties such as shape, color, and spatial configuration of tracked semantic objects. The syste...

