Results 1 - 10
of
153
Multimodal Video Indexing: A Review of the State-of-the-art
- Multimedia Tools and Applications
, 2003
"... Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video in ..."
Abstract
-
Cited by 173 (19 self)
- Add to MetaCart
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
Detecting Text in Natural Scenes with Stroke Width Transform
"... We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scal ..."
Abstract
-
Cited by 137 (0 self)
- Add to MetaCart
(Show Context)
We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages. 1.
Localizing and Segmenting Text in Images and Videos,
- IEEE Transactions on Circuits and Systems for Video Technology,
, 2002
"... ..."
Icdar 2003 robust reading competitions
- in Proceedings of the Seventh International Conference on Document Analysis and Recognition
"... This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets, and gain a clear understanding of the current state of the art. W ..."
Abstract
-
Cited by 78 (1 self)
- Add to MetaCart
(Show Context)
This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets, and gain a clear understanding of the current state of the art. We use the term robust reading to refer to text images that are beyond the capabilities of current commercial OCR packages. We chose to break down the robust reading problem into three sub-problems, and run competitions for each stage, and also a competition for the best overall system. The sub-problems we chose were text locating, character recognition and word recognition. By breaking down the problem in this way, we hope to gain a better understanding of the state of the art in each of the sub-problems. Furthermore, our methodology involves storing detailed results of applying each algorithm to each image in the data sets, allowing researchers to study in depth the strengths and weaknesses of each algorithm. The text locating contest was the only one to have any entries. We report the results of this contest, and show cases where the leading algorithms succeed and fail. 1.
Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The current paper presents a novel texture-based method for detecting texts in images. A support vector machine (SVM) is used to analyze the textural properties of texts. No external texture feature extraction module is used; rather, the intensities of the raw pixels that make up the textur ..."
Abstract
-
Cited by 68 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The current paper presents a novel texture-based method for detecting texts in images. A support vector machine (SVM) is used to analyze the textural properties of texts. No external texture feature extraction module is used; rather, the intensities of the raw pixels that make up the textural pattern are fed directly to the SVM, which works well even in high-dimensional spaces. Next, text regions are identified by applying a continuously adaptive mean shift algorithm (CAMSHIFT) to the results of the texture analysis. The combination of CAMSHIFT and SVMs produces both robust and efficient text detection, as time-consuming texture analyses for less relevant pixels are restricted, leaving only a small part of the input image to be texture-analyzed. Index Terms—Text detection, image indexing, texture analysis, support vector machine, CAMSHIFT.
Automatic detection and recognition of signs from natural scenes
- IEEE Trans. Image Process
, 2004
"... Abstract—In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification ..."
Abstract
-
Cited by 63 (4 self)
- Add to MetaCart
(Show Context)
Abstract—In this paper, we present an approach to automatic detection and recognition of signs from natural scenes, and its application to a sign translation task. The proposed approach embeds multiresolution and multiscale edge detection, adaptive searching, color analysis, and affine rectification in a hierarchical framework for sign detection, with different emphases at each phase to handle the text in different sizes, orientations, color distributions and backgrounds. We use affine rectification to re-cover deformation of the text regions caused by an inappropriate camera view angle. The procedure can significantly improve text detection rate and optical character recognition (OCR) accuracy. Instead of using binary information for OCR, we extract features from an intensity image directly. We propose a local intensity normalization method to effectively handle lighting variations, followed by a Gabor transform to obtain local features, and finally a linear discriminant analysis (LDA) method for feature selection. We have applied the approach in developing a Chinese sign translation system, which can automatically detect and recognize Chinese signs as input from a camera, and translate the recognized text into English. Index Terms—Affine rectification, optical character recognition (OCR), sign detection, sign recognition, text detection. I.
A unified framework for semantic shot classification in sports video
- Transactions on Multimedia
, 2002
"... In this demonstration, we present a unified framework for semantic shot classification in sports videos. Unlike previous approaches, which focus on clustering by aggregating shots with similar low-level features, the proposed scheme makes use of domain knowledge of specific sport to perform a top-do ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
(Show Context)
In this demonstration, we present a unified framework for semantic shot classification in sports videos. Unlike previous approaches, which focus on clustering by aggregating shots with similar low-level features, the proposed scheme makes use of domain knowledge of specific sport to perform a top-down video shot classification. That is, combining with inherent game rules and television field production, for each sport through careful observations we predefine a set of semantic shots which cover 90 to 95 % of sports broadcasting video. Under the supervision of predefined shots set, we map the low-level features to high-level semantic video shot attributes such as dominant object motion (a player), persistent camera panning, and court shape. On the basis of the appropriate fusion of those high-level shot attributes, we classify video shots into several predefined categories, each of which has a clear semantic meaning. The experiments show that, compared to traditional clustering methods and key-frame based analysis, the proposed framework features great capability of semantics mining. Due to remarkable structure constraints and limited sports photography, this framework provides a generic solution for sports video shot classification, which can be adapted to a new sport type without major modification. With correctly classified sports video shots further structural and temporal analysis will be greatly facilitated.
Progress in camera-based document image analysis
- Proc. ICDAR’03
, 2003
"... The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are hig ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development. 1
Texture for Script Identification
- IEEE Trans. Pattern Analysis and Machine Intelligence
, 2005
"... Abstract—The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we inve ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed. Index Terms—Script identification, wavelets and fractals, texture, document analysis, clustering, classification and association rules. 1
An Automatic Performance Evaluation Protocol for Video Text Detection Algorithms”,
- IEEE Trans. on CSVT,
, 2004
"... ..."