Results 1 -
5 of
5
Multimedia Content Analysis Using Both Audio and Visual Cues
, 2000
"... : Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part shoul ..."
Abstract
-
Cited by 70 (0 self)
- Add to MetaCart
: Including all the scenes/shots that contain special events may generate too long an abstract. Also, simply staggering them together may not be visually or aurally appealing. In the MoCA project, it was determined that only 50% of the abstract should contain special events. The remaining part should be left for filler clips. The special event clips to be included are chosen uniformly and randomly from different types of events. The selection of a short clip from a scene is subject to some additional criteria, such as the amount of action and the similarity to the overall color composition of the movie. Closeness to the desired AV characteristics of certain scene types are also considered. The filler clips are chosen so that they do not overlap with the content of chosen special event clips, to ensure a good coverage of all parts of a movie. MPEG-7 Standard for Multimedia Content Description Interface MPEG-7 is an on-going standardization effort for content description of AV documen...
A Utility Framework for the Automatic Generation of Audio-Visual Skims
- ACM Multimedia
, 2002
"... In this paper, we present a novel algorithm for generating audiovisual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in settop boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
In this paper, we present a novel algorithm for generating audiovisual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in settop boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. There are three key aspects to our approach: (a) visual complexity and grammar, (b) robust audio segmentation and (c) an utility model for skim generation. We define a measure of visual complexity of a shot, and map complexity to the minimum time for comprehending the shot. Then, we analyze the underlying visual grammar, since it makes the shot sequence meaningful. We segment the audio data into four classes, and then detect significant phrases in the speech segments. The utility functions are defined in terms of complexity and duration of the segment. The target skim is created using a general constrained utility maximization procedure that maximizes the information content and the coherence of the resulting skim. The objective function is constrained due to multimedia synchronization constraints, visual syntax and by penalty functions on audio and video segments. The user study results indicate that the optimal skims show statistically significant differences with other skims with compression rates up to 90%.
Keyframe-Based User Interfaces for Digital Video
, 2001
"... eo from a single camera running from camera on to camera off. Using one keyframe per shot means that representing a one-hour video usually requires hundreds of keyframes. In contrast, our approach for video indexing and summarization selects fewer keyframes that represent the entire video and index ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
eo from a single camera running from camera on to camera off. Using one keyframe per shot means that representing a one-hour video usually requires hundreds of keyframes. In contrast, our approach for video indexing and summarization selects fewer keyframes that represent the entire video and index the interesting parts. The user can select the number of keyframes or the application can select the optimal number of keyframes based on display size, but a one-hour video typically will have between 10 and 40 keyframes. We use several techniques to present the automatically selected keyframes. A video directory listing shows one keyframe for each video and provides a slider that lets the user change the keyframes dynamically. The visual summary of a single video presents images in a compact, visually pleasing display. To deal with the large number of keyframes that represent clips in a video editing system, we group keyframes into piles based on their visual similarity. In all three inter
The Effect of Text in Storyboards for Video Navigation
, 2001
"... A storyboard is a presentation scheme for abstracting information in a digital video clip based on imagery. This paper describes a series of storyboard interfaces with added transcript text features. These interfaces are used in a controlled experiment focusing on the utility of transcript text in s ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
A storyboard is a presentation scheme for abstracting information in a digital video clip based on imagery. This paper describes a series of storyboard interfaces with added transcript text features. These interfaces are used in a controlled experiment focusing on the utility of transcript text in storyboards for news video navigation. We wished to explore whether such text resulted in improvements in video navigation, and, if so, whether the amount of text and its synchronization with video imagery affected the navigation task. The text-augmented storyboards performed significantly better than storyboards with no text. Full transcript text produced benefits when presented as a block, whereas reduced contextual text descriptions produced benefits when aligned with storyboard image rows.
Condensing Computable Scenes Using Visual Complexity And Film Syntax Analysis
- PROCEEDINGS OF ICME 2001
, 2001
"... In this paper, we present a novel algorithm to condense computable scenes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. We attempt to condense such scenes in two ways. First, we define visual complexity of a shot to be its Kolmog ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper, we present a novel algorithm to condense computable scenes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. We attempt to condense such scenes in two ways. First, we define visual complexity of a shot to be its Kolmogorov complexity. Then, we conduct experiments that help us map the complexity of a shot into the minimum time required for its comprehension. Second, we analyze the grammar of the film language, since it makes the shot sequence meaningful. These grammatical rules are used to condense scenes, in parallel to the shot level condensation. We've implemented a system that generates a skim given a time budget. Our user studies show good results on skims with compression rates between 60-80%.

