Results 1 - 10
of
17
Real-time american sign language recognition using desk and wearable computer based video
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1998
"... We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American Sign Language (ASL) using a single camera to track the user’s unadorned hands. The first system observes the user from a desk mounted camera and achieves 92 percent word accuracy. The secon ..."
Abstract
-
Cited by 367 (20 self)
- Add to MetaCart
We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American Sign Language (ASL) using a single camera to track the user’s unadorned hands. The first system observes the user from a desk mounted camera and achieves 92 percent word accuracy. The second system mounts the camera in a cap worn by the user and achieves 98 percent accuracy (97 percent with an unrestricted grammar). Both experiments use a 40-word lexicon.
Visual Recognition of American Sign Language Using Hidden Markov Models
, 1995
"... Using hidden Markov models (HMM's), an unobstrusive single view camera system is developed that can recognize hand gestures, namely, a subset of American Sign Language (ASL). Previous systems have concentrated on finger spelling or isolated word recognition, often using tethered electronic gloves fo ..."
Abstract
-
Cited by 240 (14 self)
- Add to MetaCart
Using hidden Markov models (HMM's), an unobstrusive single view camera system is developed that can recognize hand gestures, namely, a subset of American Sign Language (ASL). Previous systems have concentrated on finger spelling or isolated word recognition, often using tethered electronic gloves for input. We achieve high recognition rates for full sentence ASL using only visual cues. A forty word lexicon consisting of personal pronouns, verbs, nouns, and adjectives is used to create 494 randomly constructed five word sentences that are signed by the subject to the computer. The data is separated into a 395 sentence training set and an independent 99 sentence test set. While signing, the 2D position, orientation, and eccentricity of bounding ellipses of the hands are tracked in real time with the assistance of solidly colored gloves. Simultaneous recognition and segmentation of the resultant stream of feature vectors occurs five times faster than real time on an HP 735. With a strong ...
Motion-Based Recognition: A Survey
- Image and Vision Computing
, 1995
"... Motion perception and interpretation plays an important role in the human visual system. It helps us recognize different objects and their motion in a scene, infer their relative depth, their rigidity, etc. In psychology, this process has been studied extensively by Johansson using moving light d ..."
Abstract
-
Cited by 85 (4 self)
- Add to MetaCart
Motion perception and interpretation plays an important role in the human visual system. It helps us recognize different objects and their motion in a scene, infer their relative depth, their rigidity, etc. In psychology, this process has been studied extensively by Johansson using moving light displays (MLDs). MLDs consist of bright spots attached to the joints of an actor dressed in black, and moving in front of a dark background. The collection of spots carry only 2D information and no structural information, since they are not connected. A set of static spots remained meaningless to observers, while their relative movement created a vivid impression of a person walking, running, dancing, etc. The gender of a person, and even the gait of a friend can be recognized based solely on the motion of those spots. There are two theories about the interpretation of MLD type stimuli, from a psychology point of view. In the first, people use motion information in the MLD to recover t...
A state-based approach to the representation and recognition ofgesture
- IEEE Trans. Patt. Analy. and Mach. Intell
, 1997
"... Abstract—A state-based technique for the representation and recognition of gesture is presented. We define a gesture to be a sequence of states in a measurement or configuration space. For a given gesture, these states are used to capture both the repeatability and variability evidenced in a trainin ..."
Abstract
-
Cited by 69 (6 self)
- Add to MetaCart
Abstract—A state-based technique for the representation and recognition of gesture is presented. We define a gesture to be a sequence of states in a measurement or configuration space. For a given gesture, these states are used to capture both the repeatability and variability evidenced in a training set of example trajectories. Using techniques for computing a prototype trajectory of an ensemble of trajectories, we develop methods for defining configuration states along the prototype and for recognizing gestures from an unsegmented, continuous stream of sensor data. The approach is illustrated by application to a range of gesturerelated sensory data: the two-dimensional movements of a mouse input device, the movement of the hand measured by a magnetic spatial position and orientation sensor, and, lastly, the changing eigenvector projection coefficients computed from an image sequence.
A Wearable Computer Based American Sign Language Recognizer
, 1997
"... Modern wearable computer designs package workstation level performance in systems small enough to be worn as clothing. These machines enable technology to be brought where it is needed the most for the handicapped: everyday mobile environments. This paper de- scribes a research effort to make a wear ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Modern wearable computer designs package workstation level performance in systems small enough to be worn as clothing. These machines enable technology to be brought where it is needed the most for the handicapped: everyday mobile environments. This paper de- scribes a research effort to make a wearable computer that can recognize (with the possible goal of translat- ing) sentence level American Sign Language (ASL) using only a baseball cap mounted camera for input. Current accuracy exceeds 97% per word on a 40 word lexicon.
Recognition of Space-Time Gestures using a Distributed Representation
"... This paper presents a method for learning, tracking, and recognizing human gestures using a view-based approach to model both object and behavior. Object views are represented using sets of view models, rather than single templates. Stereotypical space-time patterns, i.e. gestures, are then matched ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper presents a method for learning, tracking, and recognizing human gestures using a view-based approach to model both object and behavior. Object views are represented using sets of view models, rather than single templates. Stereotypical space-time patterns, i.e. gestures, are then matched to stored gesture patterns using dynamic time warping. Real-time performance is achieved by using special-purpose correlation hardware and view prediction to prune as much of the search space as possible. Both view models and view predictions are learned from examples. We present results showing tracking and recognition of human hand gestures at over 10Hz. 1 Introduction The location and orientation of head, hand, and eyes is a critical element of all human dialog. The ability to follow objects moving through space and recognize particular motions as meaningful gestures is therefore essential if computer systems are to interact naturally with human users [2, 6]. Currently, however, this inf...
Configuration States for the Representation and Recognition of Gesture
"... A state-based technique for the representation and recognition of gesture is presented. We define a gesture to be a sequence of states in a measurement or configuration space. For a given gesture, these states are used to capture both the reapeatability and variability evidenced in a training set of ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
A state-based technique for the representation and recognition of gesture is presented. We define a gesture to be a sequence of states in a measurement or configuration space. For a given gesture, these states are used to capture both the reapeatability and variability evidenced in a training set of example trajectories. Using techniques for computing a prototype trajectory of an ensemble of trajectories, we develop methods for defining configuration states along the prototype, and for recognizing gestures from an unsegmented, continuous stream of sensor data. The approach is illustrated by application to a range of gesture-related sensory data: the two-dimensional movements of a mouse input device, the movement of the hand measured by a magnetic spatial position and orientation sensor, and, lastly, the changing eigenvector projection coefficients computed from an image sequence.
Visual interpretation for hand gestures as a practical interface modality
- Columbia University
, 1997
"... This dissertation describes a user interface in which many tasks traditionally performed by a mouse are instead performed using visual recognition of hand gestures. The goals are to explore both how a vision system should be designed to recognize hand gestures, and how they are best used in a genera ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This dissertation describes a user interface in which many tasks traditionally performed by a mouse are instead performed using visual recognition of hand gestures. The goals are to explore both how a vision system should be designed to recognize hand gestures, and how they are best used in a general purpose interface. Observed by a camera below the screen, the user manipulates objects directly with gestures incorporating both motion and pose. Task and domain knowledge provide context, allowing real-time recognition on standard PC hardware. A color-based algorithm is trained to segment user's hands from complex backgrounds without visual aids. Training uses a novel combination of both positive and negative data to improve segmentation quality. The apparent path of the hand is smoothed with an algorithm which reduces the types of noise inherent in the domain but leaves a cursor motion on the screen that feels natural for the user. Salient features of the motion are extracted, including a newly discovered natural gesture (a “Comma”), which helps provide punctuation for each gestural sentence. Neural networks are trained to classify the pose of the user's hand from cropped and preprocessed images. The nets correctly classify 90-95 % of the hand images in real time. A transition network encodes the interaction language. It controls the application of feature extraction operators and interprets their results to determine when to perform actions on the user's behalf. The style of interaction is based on studies of natural gesticulation and incorporates various features designed to make it natural and easy for the user to remember. The system demonstrates a 80-90 % success rate on most tasks. Object selection time for large objects is demonstrated to be equal or superior to that of a mouse. Object selection performance is modeled accurately by augmenting Fitts ' Law with terms for lag and random cursor noise. Finally, the suitability of gesture for this type of task is considered. Various interaction styles are examined, and problems specific to hand gesture are discussed.
Variable Frame Rate for Low Power Mobile Sign Language Communication
"... The MobileASL project aims to increase accessibility by enabling Deaf people to communicate over video cell phones in their native language, American Sign Language (ASL). Real-time video over cell phones can be a computationally intensive task that quickly drains the battery, rendering the cell phon ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
The MobileASL project aims to increase accessibility by enabling Deaf people to communicate over video cell phones in their native language, American Sign Language (ASL). Real-time video over cell phones can be a computationally intensive task that quickly drains the battery, rendering the cell phone useless. Properties of conversational sign language allow us to save power and bits: namely, lower frame rates are possible when one person is not signing due to turntaking, and signing can potentially employ a lower frame rate than fingerspelling. We conduct a user study with native signers to examine the intelligibility of varying the frame rate based on activity in the video. We then describe several methods for automatically determining the activity of signing or not signing from the video stream in real-time. Our results show that varying the frame rate during turn-taking is a good way to save power without sacrificing intelligibility, and that automatic activity analysis is feasible.
Can you see me now?’ An objective metric for predicting intelligibility of compressed American Sign Language video
- in Proc. SPIE Vol. 6492, Human Vision and Electronic Imaging ’07
"... For members of the Deaf Community in the United States, current communication tools include TTY/TTD services, video relay services, and text-based communication. With the growth of cellular technology, mobile sign language conversations are becoming a possibility. Proper coding techniques must be em ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
For members of the Deaf Community in the United States, current communication tools include TTY/TTD services, video relay services, and text-based communication. With the growth of cellular technology, mobile sign language conversations are becoming a possibility. Proper coding techniques must be employed to compress American Sign Language (ASL) video for low-rate transmission while maintaining the quality of the conversation. In order to evaluate these techniques, an appropriate quality metric is needed. This paper demonstrates that traditional video quality metrics, such as PSNR, fail to predict subjective intelligibility scores. By considering the unique structure of ASL video, an appropriate objective metric is developed. Face and hand segmentation is performed using skin-color detection techniques. The distortions in the face and hand regions are optimally weighted to create an objective intelligibility score for a distorted sequence. The objective intelligibility metric performs significantly better than PSNR in terms of correlation with subjective responses.

