Results 1 - 10
of
21
3D hand pose reconstruction using specialized mappings
- In Proc. International Conf. on Computer Vision (ICCV), Vol.1
, 2001
"... A system for recovering 3D hand pose from monocu-lar color sequences is proposed. The system employs a non-linear supervised learning framework, the specialized mappings architecture (SMA), to map image features to likely 3D hand poses. The SMA’s fundamental components are a set of specialized forwa ..."
Abstract
-
Cited by 90 (10 self)
- Add to MetaCart
(Show Context)
A system for recovering 3D hand pose from monocu-lar color sequences is proposed. The system employs a non-linear supervised learning framework, the specialized mappings architecture (SMA), to map image features to likely 3D hand poses. The SMA’s fundamental components are a set of specialized forward mapping functions, and a single feedback matching function. The forward functions are estimated directly from training data, which in our case are examples of hand joint configurations and their corre-sponding visual features. The joint angle data in the train-ing set is obtained via a CyberGlove, a glove with 22 sen-sors that monitor the angular motions of the palm and fin-gers. In training, the visual features are generated using a computer graphics module that renders the hand from ar-bitrary viewpoints given the 22 joint angles. The viewpoint is encoded by two real values, therefore 24 real values rep-resent a hand pose. We test our system both on synthetic sequences and on sequences taken with a color camera. The system automatically detects and tracks both hands of the user, calculates the appropriate features, and estimates the 3D hand joint angles and viewpoint from those features. Results are encouraging given the complexity of the task. 1
An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation
, 2002
"... An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground truth labels of those matches, contain ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
(Show Context)
An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground truth labels of those matches, containing hand shape and camera viewpoint information, are returned by the system as estimates for the input image. Database retrieval is done hierarchically, by first quickly rejecting the vast majority of all database views, and then ranking the remaining candidates in order of similarity to the input. Four different similarity measures are employed, based on edge location, edge orientation, finger location and geometric moments.
The American Sign Language lexicon video dataset
- In IEEE Workshop on Computer Vision and Pattern Recognition for Human Communicative Behavior Analysis (CVPR4HB
, 2008
"... The lack of a written representation for American Sign Language (ASL) makes it difficult to do something as commonplace as looking up an unknown word in a dictionary. The majority of printed dictionaries organize ASL signs (represented in drawings or pictures) based on their nearest English translat ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
(Show Context)
The lack of a written representation for American Sign Language (ASL) makes it difficult to do something as commonplace as looking up an unknown word in a dictionary. The majority of printed dictionaries organize ASL signs (represented in drawings or pictures) based on their nearest English translation; so unless one already knows the meaning of a sign, dictionary look-up is not a simple proposition. In this paper we introduce the ASL Lexicon Video Dataset, a large and expanding public dataset containing video sequences of thousands of distinct ASL signs, as well as annotations of those sequences, including start/end frames and class label of every sign. This dataset is being created as part of a project to develop a computer vision system that allows users to look up the meaning of an ASL sign. At the same time, the dataset can be useful for benchmarking a variety of computer vision and machine learning methods designed for learning and/or indexing a large number of visual classes, and especially approaches for analyzing gestures and human communication. 1.
3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views
- In IEEE Workshop on Cues in Communication
, 2001
"... Ongoing work towards appearance-based 3D hand pose estimation from a single image is presented. A large database of synthetic hand views is generated using a 3D hand model and computer graphics. The views display different hand shapes as seen from arbitrary viewpoints. Each synthetic view is automat ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Ongoing work towards appearance-based 3D hand pose estimation from a single image is presented. A large database of synthetic hand views is generated using a 3D hand model and computer graphics. The views display different hand shapes as seen from arbitrary viewpoints. Each synthetic view is automatically labeled with parameters describing its hand shape and viewing parameters. Given an input image, the system retrieves the most similar database views, and uses the shape and viewing parameters of those views as candidate estimates for the parameters of the input image.
A Database-Based Framework for Gesture Recognition
- PERSONAL AND UBIQUITOUS COMPUTING
"... Gestures are an important modality for human-machine communication. Computer vision modules performing gesture recognition can be important components of intelligent homes, assistive environments, and human-computer interfaces. A key problem in recognizing gestures is that the appearance of a gestu ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Gestures are an important modality for human-machine communication. Computer vision modules performing gesture recognition can be important components of intelligent homes, assistive environments, and human-computer interfaces. A key problem in recognizing gestures is that the appearance of a gesture can vary widely depending on variables such as the person performing the gesture, or the position and orientation of the camera. This paper presents a database-based approach for addressing this problem. The large variability in appearance among different examples of the same gesture is addressed by creating large gesture databases, that store enough exemplars from each gesture to capture the variability within that gesture. This database-based approach is applied to two gesture recognition problems: handshape categorization and motion-based recognition of American Sign Language (ASL) signs. A key aspect of our approach is the use of database indexing methods, in order to address the challenge of searching large databases without violating the time constraints of an online interactive system, where system response times of over a few seconds are oftentimes considered unacceptable. Our experiments demonstrate the benefits of the proposed database-based framework, and the feasibility of integrating large gesture databases into online interacting systems.
Unsupervised Modeling of Signs Embedded in Continuous Sentences,”
- Proc. IEEE Workshop Vision for Human-Computer Interaction,
, 2005
"... Abstract ..."
(Show Context)
ThreadMill: a Highly Configurable Architecture for Human Communication Analysis Applications
, 2003
"... This dissertation would not exist if it were not for her ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
This dissertation would not exist if it were not for her
Towards Automated Large Vocabulary Gesture Search
, 2009
"... This paper describes work towards designing a computer vision system for helping users look up the meaning of a sign. Sign lookup is treated as a video database retrieval problem. A video database is utilized that contains one or more video examples for each sign, for a large number of signs (close ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
This paper describes work towards designing a computer vision system for helping users look up the meaning of a sign. Sign lookup is treated as a video database retrieval problem. A video database is utilized that contains one or more video examples for each sign, for a large number of signs (close to 1000 in our current experiments). The emphasis of this paper is on evaluating the tradeoffs between a non-automated approach, where the user manually specifies hand locations in the input video, and a fully automated approach, where hand locations are determined using a computer vision module, thus introducing inaccuracies into the sign retrieval process. We experimentally evaluate both approaches and present their respective advantages and disadvantages.
Finding Recurrent Patterns from Continuous Sign Language Sentences for Automated Extraction of Signs
"... We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations prod ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Each sentence video is first transformed into a multidimensional time series representation, capturing the motion and shape aspects of the sign. Skin color blobs are extracted from frames of color video sequences, and a probabilistic relational distribution is formed for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented as a trajectory in a low dimensional space called the space of relational distributions. Given these time series trajectories, we extract signemes from multiple sentences concurrently using iterated conditional modes (ICM). We show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that our approach is robust to some extent to the variations produced within a sign due to different contexts. We also show results whereby these learned sign models are used for spotting signs in test sequences.
A Similarity Measure for Vision-Based Sign Recognition
, 2009
"... When we encounter an English word that we do not understand, we can look it up in a dictionary. However, when an American Sign Language (ASL) user encounters an unknown sign, looking up the meaning of that sign is not a straightforward process. It has been recently proposed that this problem can be ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
When we encounter an English word that we do not understand, we can look it up in a dictionary. However, when an American Sign Language (ASL) user encounters an unknown sign, looking up the meaning of that sign is not a straightforward process. It has been recently proposed that this problem can be addressed using a computer vision system that helps users look up the meaning of a sign. In that approach, sign lookup can be treated as a video database retrieval problem. When the user encounters an unknown sign, the user provides a video example of that sign as a query, so as to retrieve the most similar signs in the database. A necessary component of such a sign lookup system is a similarity measure for comparing sign videos. Given a query video of a specific sign, the similarity measure should assign high similarity values to videos from the same sign, and low similarity values to videos from other signs. This paper evaluates a state-of-the-art video-based similarity measure called Dynamic Space-Time Warping (DSTW) for the purposes of sign retrieval. The paper also discusses how to specifically adapt DSTW so as to tolerate differences in translation and scale.