Results 1 - 10
of
52
Scene text recognition using higher order langauge priors
- In BMVC
, 2012
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
End-to-End Text Recognition with Convolutional Neural Networks
"... Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully handengineered features or large amounts of prior knowledge. In this paper, we take a differ ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
(Show Context)
Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully handengineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off-the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003. 1
PhotoOCR: Reading Text in Uncontrolled Conditions
- In ICCV
"... Abstract We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recen ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
(Show Context)
Abstract We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern datacenter-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.
Scene text recognition using part-based tree-structured character detection
- In CVPR
"... demonstrate that the proposed method outperforms stateof-the-art methods significantly both for character detection and word recognition. ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
demonstrate that the proposed method outperforms stateof-the-art methods significantly both for character detection and word recognition.
W.: Strokelets: A learned multi-scale representation for scene text recognition
- In: Proc. CVPR (2014
"... Driven by the wide range of applications, scene text de-tection and recognition have become active research topics in computer vision. Though extensively studied, localizing and reading text in uncontrolled environments remain ex-tremely challenging, due to various interference factors. In this pape ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
Driven by the wide range of applications, scene text de-tection and recognition have become active research topics in computer vision. Though extensively studied, localizing and reading text in uncontrolled environments remain ex-tremely challenging, due to various interference factors. In this paper, we propose a novel multi-scale representation for scene text recognition. This representation consists of a set of detectable primitives, termed as strokelets, which capture the essential substructures of characters at differ-ent granularities. Strokelets possess four distinctive advan-tages: (1) Usability: automatically learned from bounding box labels; (2) Robustness: insensitive to interference fac-tors; (3) Generality: applicable to variant languages; and (4) Expressivity: effective at describing characters in natu-ral scenes. Extensive experiments on standard benchmarks verify the advantages of strokelets and demonstrate that the proposed algorithm outperforms the state-of-the-art meth-ods in the literature. 1.
Whole is Greater than Sum of Parts: Recognizing Scene Text Words
"... Abstract—Recognizing text in images taken in the wild is a challenging problem that has received great attention in recent years. Previous methods addressed this problem by first detecting individual characters, and then forming them into words. Such approaches often suffer from weak character detec ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Recognizing text in images taken in the wild is a challenging problem that has received great attention in recent years. Previous methods addressed this problem by first detecting individual characters, and then forming them into words. Such approaches often suffer from weak character detections, due to large intra-class variations, even more so than characters from scanned documents. We take a different view of the problem and present a holistic word recognition framework. In this, we first represent the scene text image and synthetic images generated from lexicon words using gradient-based features. We then recognize the text in the image by matching the scene and synthetic image features with our novel weighted Dynamic Time Warping (wDTW) approach. We perform experimental analysis on challenging public
Scene Text Segmentation via Inverse Rendering
"... Abstract—Recognizing text in natural photographs that contain specular highlights and focal blur is a challenging problem. In this paper we describe a new text segmentation method based on inverse rendering, i.e. decomposing an input image into basic rendering elements. Our technique uses iterative ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Recognizing text in natural photographs that contain specular highlights and focal blur is a challenging problem. In this paper we describe a new text segmentation method based on inverse rendering, i.e. decomposing an input image into basic rendering elements. Our technique uses iterative optimization to solve the rendering parameters, including light source, material properties (e.g. diffuse/specular reflectance and shininess) as well as blur kernel size. We combine our segmentation method with a recognition component and show that by accounting for the rendering parameters, our approach achieves higher text recognition accuracy than previous work, particularly in the presence of color changes and image blur. In addition, the derived rendering parameters can be used to synthesize new text images that imitate the appearance of an existing image. I.
Word Spotting and Recognition with Embedded Attributes
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2014
"... This article addresses the problems of word spotting and word recognition on images. In word spotting, the goal is to find all instances of a query word in a dataset of images. In recognition, the goal is to recognize the content of the word image, usually aided by a dictionary or lexicon. We descri ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This article addresses the problems of word spotting and word recognition on images. In word spotting, the goal is to find all instances of a query word in a dataset of images. In recognition, the goal is to recognize the content of the word image, usually aided by a dictionary or lexicon. We describe an approach in which both word images and text strings are embedded in a common vectorial subspace. This is achieved by a combination of label embedding and attributes learning, and a common subspace regression. In this subspace, images and strings that represent the same word are close together, allowing one to cast recognition and retrieval tasks as a nearest neighbor problem. Contrary to most other existing methods, our representation has a fixed length, is low dimensional, and is very fast to compute and, especially, to compare. We test our approach on four public datasets of both handwritten documents and natural images showing results comparable or better than the state-of-the-art on spotting and recognition tasks.
Scene Text Recognition using Co-occurrence of Histogram of Oriented Gradients
"... Abstract—Scene text recognition is a fundamental step in Endto-End applications where traditional optical character recognition (OCR) systems often fail to produce satisfactory results. This paper proposes a technique that uses co-occurrence histogram of oriented gradients (Co-HOG) to recognize the ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Scene text recognition is a fundamental step in Endto-End applications where traditional optical character recognition (OCR) systems often fail to produce satisfactory results. This paper proposes a technique that uses co-occurrence histogram of oriented gradients (Co-HOG) to recognize the text in scenes. Compared with histogram of oriented gradients (HOG), Co-HOG is a more powerful tool that captures spatial distribution of neighboring orientation pairs instead of just a single gradient orientation. At the same time, it is more efficient compared with HOG and therefore more suitable for real-time applications. The proposed scene text recognition technique is evaluated on ICDAR2003 character dataset and Street View Text (SVT) dataset. Experiments show that the Co-HOG based technique clearly outperforms state-of-the-art techniques that use HOG,
Robust scene text detection with convolution neural network induced mser trees
- In Computer Vision–ECCV 2014
, 2014
"... Abstract. Maximally Stable Extremal Regions (MSERs) have achieved great success in scene text detection. However, this low-level pixel opera-tion inherently limits its capability for handling complex text information efficiently (e. g. connections between text or background components), leading to t ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Maximally Stable Extremal Regions (MSERs) have achieved great success in scene text detection. However, this low-level pixel opera-tion inherently limits its capability for handling complex text information efficiently (e. g. connections between text or background components), leading to the difficulty in distinguishing texts from background compo-nents. In this paper, we propose a novel framework to tackle this problem by leveraging the high capability of convolutional neural network (CNN). In contrast to recent methods using a set of low-level heuristic features, the CNN network is capable of learning high-level features to robustly identify text components from text-like outliers (e.g. bikes, windows, or leaves). Our approach takes advantages of both MSERs and sliding-window based methods. The MSERs operator dramatically reduces the number of windows scanned and enhances detection of the low-quality texts. While the sliding-window with CNN is applied to correctly sepa-rate the connections of multiple characters in components. The proposed system achieved strong robustness against a number of extreme text vari-ations and serious real-world problems. It was evaluated on the ICDAR 2011 benchmark dataset, and achieved over 78 % in F-measure, which is significantly higher than previous methods.