Results 1 -
7 of
7
Exploiting Multi-level Parallelism for Lowlatency Activity Recognition
- IN STREAMING VIDEO; PROC. ACM MULTIMEDIA SYSTEMS (MMSYS) CONFERENCE, 2010
"... Video understanding is a computationally challenging task that is critical not only for traditionally throughput-oriented applications such as search but also latency-sensitive interactive applications such as surveillance, gaming, videoconferencing, and vision-based user interfaces. Enabling these ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Video understanding is a computationally challenging task that is critical not only for traditionally throughput-oriented applications such as search but also latency-sensitive interactive applications such as surveillance, gaming, videoconferencing, and vision-based user interfaces. Enabling these types of video processing applications will require not only new algorithms and techniques, but new runtime systems that optimize latency as well as throughput. In this paper, we present a runtime system called Sprout that achieves low latency by exploiting the parallelism inherent in video understanding applications. We demonstrate the utility of our system on an activity recognition application that employs a robust new descriptor called MoSIFT, which explicitly augments appearance features with motion information. MoSIFT outperforms previous recognition techniques, but like other state-of-the-art techniques, it is computationally expensive — a sequential implementation runs 100 times slower than real time. We describe the implementation of the activity recognition application on Sprout, and show that it can accurately recognize activities at full frame rate (25 fps) and low latency on a challenging airport surveillance video corpus.
Odessa: Enabling Interactive Perception Applications on Mobile Devices ∗
"... Resource constrained mobile devices need to leverage computation on nearby servers to run responsive applications that recognize objects, people, or gestures from real-time video. The two key questions that impact performance are what computation to offload, and how to structure the parallelism acro ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Resource constrained mobile devices need to leverage computation on nearby servers to run responsive applications that recognize objects, people, or gestures from real-time video. The two key questions that impact performance are what computation to offload, and how to structure the parallelism across the mobile device and server. To answer these questions, we develop and evaluate three interactive perceptual applications. We find that offloading and parallelism choices should be dynamic, even for a given application, as performance depends on scene complexity as well as environmental factors such as the network and device capabilities. To this end we develop Odessa, a novel, lightweight, runtime that automatically and adaptively makes offloading and parallelism decisions for mobile interactive perception applications. Our evaluation shows that the incremental greedy strategy of Odessa converges to an operating point that is close to an ideal offline partitioning. It provides more than a 3x improvement in application performance over partitioning suggested by domain experts. Odessa works well across a variety of execution environments, and is agile to changes in the network, device and application inputs.
Target Container: A Target-Centric Parallel Programming Abstraction for Video-based
"... Abstract — We introduce a novel abstraction, the target container (TC), which serves as a parallel programming model and execution framework for developing complex applications for tracking multiple targets in a large-scale camera network. The key insight is to allow the domain expert (e.g., a visio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — We introduce a novel abstraction, the target container (TC), which serves as a parallel programming model and execution framework for developing complex applications for tracking multiple targets in a large-scale camera network. The key insight is to allow the domain expert (e.g., a vision researcher) to focus on the algorithmic details of target tracking and let the system deal with providing the computational resources (cameras, networking, and processing) to enable target tracking. Each TC has a one-to-one correspondence with a target, possibly tracked from multiple cameras. The domain expert provides the code modules for target tracking (such as detectors and trackers) as handlers to the TC system. The handlers are invoked dynamically by the TC system to discover new targets (detector) and to follow existing targets (tracker). The TC system also provides an interface for merging TCs whenever they are determined to be corresponding to the same target. This paper presents the design of the TC system, details of an experimental prototype, and an example application to demonstrate the simplicity of using the TC programming model. I.
Controlling Your TV with Gestures
"... Vision-based user interfaces enable natural interaction modalities such as gestures. Such interfaces require computationally intensive video processing at low latency. We demonstrate an application that recognizes gestures to control TV operations. Accurate recognition is achieved by using a new des ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Vision-based user interfaces enable natural interaction modalities such as gestures. Such interfaces require computationally intensive video processing at low latency. We demonstrate an application that recognizes gestures to control TV operations. Accurate recognition is achieved by using a new descriptor called MoSIFT, which explicitly encodes optical flow with appearance features. MoSIFT is computationally expensive — a sequential implementation runs 100 times slower than real time. To reduce latency sufficiently for interaction, the application is implemented on a runtime system that exploits the parallelism inherent in video understanding applications. Categories and Subject Descriptors C.3 [Computer Systems Organization]: Special-Purpose and application base systems. D.2 [Software] Software engineering.
Long Term Activity Analysis in Surveillance Video Archives
, 2010
"... Surveillance video recording is becoming ubiquitous in daily life for public areas such as supermarkets, banks, and airports. The rate at which surveillance video is being generated has accelerated demand for machine understanding to enable better content-based search capabilities. Analyzing human a ..."
Abstract
- Add to MetaCart
Surveillance video recording is becoming ubiquitous in daily life for public areas such as supermarkets, banks, and airports. The rate at which surveillance video is being generated has accelerated demand for machine understanding to enable better content-based search capabilities. Analyzing human activity is one of the key tasks to understand and search surveillance videos. In this thesis, we perform a comprehensive study on analyzing human activities from short term to long term and from simple to complicated activities in surveillance video achieves. A general, efficient and robust human activity recognition framework is proposed. We extract local descriptors at salient points from videos to represent human activities. The local descriptor is called Motion SIFT (MoSIFT) which explicitly augments appearance features with motion information. A quantization and classification framework then applies the descriptors to recognize activities of interest in surveillance
A Contact Sheet Approach to Searching Untagged Images on Smartphones
, 2011
"... We describe a cloud-based approach to opportunistic, crowd-sourced, near real-time search of untagged images on smartphones that is sensitive to bandwidth and energy constraints. Our approach is inspired by the long-established practice of photographers using contact sheets to rapidly visualize a ne ..."
Abstract
- Add to MetaCart
We describe a cloud-based approach to opportunistic, crowd-sourced, near real-time search of untagged images on smartphones that is sensitive to bandwidth and energy constraints. Our approach is inspired by the long-established practice of photographers using contact sheets to rapidly visualize a new collection of photographs, and then selecting a subset on which to focus attention. On behalf of each smartphone, the cloud maintains a virtual contact sheet of images that have been captured but not yet uploaded. The virtual contact sheet consists of thumbnails as well as full or partial meta data associated with the image. If search processing on the cloud indicates that a particular thumbnail is relevant, then its full-fidelity image can be obtained from the corresponding smartphone for further search processing or presentation to the user. We identify refinements, design tradeoffs and research questions pertaining to this approach.
PROCEEDINGS OF THE IEEE 1 Large-scale Situation Awareness with Camera Networks and Multimodal Sensing
"... Sensors of various modalities and capabilities, especially cameras, have become ubiquitous in our environment. Their intended use is wide ranging and encompasses surveillance, transportation, entertainment, education, healthcare, emergency response, disaster recovery, and the like. Technological adv ..."
Abstract
- Add to MetaCart
Sensors of various modalities and capabilities, especially cameras, have become ubiquitous in our environment. Their intended use is wide ranging and encompasses surveillance, transportation, entertainment, education, healthcare, emergency response, disaster recovery, and the like. Technological advances and the low cost of such sensors enable deployment of large-scale camera networks in large metropolis such as London and New York. Multimedia algorithms for analyzing and drawing inferences from video and audio have also matured tremendously in recent times. Despite all these advances, large-scale reliable systems for media-rich sensor-based applications, often classified as situation awareness applications, are yet to become commonplace. Why is that? There are

