Results 1 - 10
of
15
Discovering Characteristic Actions from On-Body Sensor Data
- IN PROC. OF IEEE INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTING
, 2006
"... We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to disco ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We present an approach to activity discovery, the unsupervised identification and modeling of human actions embedded in a larger sensor stream. Activity discovery can be seen as the inverse of the activity recognition problem. Rather than learn models from hand-labeled sequences, we attempt to discover motifs, sets of similar subsequences within the raw sensor stream, without the benefit of labels or manual segmentation. These motifs are statistically unlikely and thus typically correspond to important or characteristic actions within the activity. The problem
Discovering meaningful multimedia patterns with audio-visual concepts and associated text
- in Int. Conf. Image Processing (ICIP
, 2004
"... The work presents the first effort to automatically annotate the semantic meanings of temporal video patterns obtained through unsupervised discovery processes. This problem is interesting in domains where neither perceptual patterns nor semantic concepts have simple structures. The patterns in vide ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The work presents the first effort to automatically annotate the semantic meanings of temporal video patterns obtained through unsupervised discovery processes. This problem is interesting in domains where neither perceptual patterns nor semantic concepts have simple structures. The patterns in video are modeled with hierarchical hidden Markov models (HHMM), with efficient algorithms to learn the parameters, the model complexity, and the relevant features; the meanings are contained in words of the speech transcript of the video. The pattern-word association is obtained via co-occurrence analysis and statistical machine translation models. Promising results are obtained through extensive experiments on 20+ hours of TRECVID news videos: video patterns that associate with distinct topics such as el-nino and politics are identified; the HHMM temporal structure model compares favorably to a non-temporal clustering algorithm. 1.
Pattern mining in visual concept streams
- Proc. IEEE Intl. Conf. Multimedia and Expo
, 2006
"... Pattern mining algorithms are often much easier applied than quantitatively assessed. In this paper we address the pattern evaluation problem by looking at both the capability of models and the difficulty of target concepts. We use four different data mining models: frequent itemset mining, k-means ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Pattern mining algorithms are often much easier applied than quantitatively assessed. In this paper we address the pattern evaluation problem by looking at both the capability of models and the difficulty of target concepts. We use four different data mining models: frequent itemset mining, k-means clustering, hidden Markov model, and hierarchical hidden Markov model to mine 39 concept streams from the a 137-video broadcast news collection from TRECVID-2005. We hypothesize that the discovered patterns can reveal semantics beyond the input space, and thus evaluate the patterns against a much larger concept space containing 192 concepts defined by LSCOM. Results show that HHMM has the best average prediction among all models, however different models seem to excel in different concepts depending on the concept prior and the ontological relationship. Results also show that the majority of the target concepts are better predicted with temporal or combination hypotheses, and there are novel concepts found that are not part of the original lexicon. This paper presents the first effort on temporal pattern mining in the large concept space. There are many promising directions to use concept mining to help construct better concept detectors or to guide the design of multimedia ontology. 1.
Layered dynamic mixture model for pattern discovery in asynchronous multi-modal streams
- Peterson and MIT X Consortium. Athena Widgct Set C Language Itcfacc X Window System. MIT X Consortium
, 2005
"... We propose a layered dynamic mixture model for asynchronous multi-modal fusion for unsupervised pattern discovery in video. The lower layer of the model uses generative temporal structures such as a hierarchical hidden Markov model to convert the audio-visual streams into mid-level labels, it also m ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We propose a layered dynamic mixture model for asynchronous multi-modal fusion for unsupervised pattern discovery in video. The lower layer of the model uses generative temporal structures such as a hierarchical hidden Markov model to convert the audio-visual streams into mid-level labels, it also models the correlations in text with probabilistic latent semantic analysis. The upper layer fuses the statistical evidence across diverse modalities with a flexible meta-mixture model that assumes loose temporal correspondence. Evaluation on a large news database shows that multi-modal clusters have better correspondence to news topics than audiovisual clusters alone; novel analysis techniques suggest that meaningful clusters occur when the prediction of salient features by the model concurs with those shown in the story clusters.
Event Mining in Multimedia Streams
, 2008
"... Events are real-world occurrences that unfold over space and time. Event mining from multimedia streams improves the access and reuse of large media collections, and it has been an active area of research with notable recent progress. This paper contains a survey on the problems and solutions in eve ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Events are real-world occurrences that unfold over space and time. Event mining from multimedia streams improves the access and reuse of large media collections, and it has been an active area of research with notable recent progress. This paper contains a survey on the problems and solutions in event mining, approached from three aspects: event description, event-modeling components, and current event mining systems. We present a general characterization of multimedia events, motivated by the maxim of five BW[s and one BH [ for reporting real-world events in journalism: when, where, who, what, why, and how. We discuss the causes for semantic variability in real-world descriptions, including multilevel
Generation of Sports Highlights Using a Combination of Supervised & Unsupervised Learning in Audio Domain
- in Proc. Pacific Rim Conference on Multimedia
, 2003
"... In our past work we have used supervised audio classification to develop a common audio-based platform for highlight extraction that works across three different sports. We then use a heuristic to post-process the classification results to identify interesting events and also to adjust the summary ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In our past work we have used supervised audio classification to develop a common audio-based platform for highlight extraction that works across three different sports. We then use a heuristic to post-process the classification results to identify interesting events and also to adjust the summary length. In this paper, we propose a combination of unsupervised and supervised learning approaches to replace the heuristic. The proposed unsupervised framework mines the semantic audio-visual labels so as to detect "interesting" events. We then use a Hidden Markov Model based approach to control the length of the summary. Our experimental results show that the proposed techniques are promising.
A content-adaptive analysis & representation framework for video summarization using audio cues. http://isis.poly.edu/ regu/ReguThesis.pdf
, 2004
"... We propose a content-adaptive analysis and representation framework to discover events using audio features from ünscripted ¨multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier based temporal segmentation of the content. It is moti ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We propose a content-adaptive analysis and representation framework to discover events using audio features from ünscripted ¨multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier based temporal segmentation of the content. It is motivated by the observation that ïnterestingëvents in unscripted multimedia occur sparsely in a background of usual or üninterestingëvents. We treat the sequence of low / mid level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low and mid level audio features extracted from sports video show that ¨highlightëvents can be extracted effectively as outliers from a background process using the proposed framework. We
Unsupervised content discovery in composite audio
- in Proc. ACM Multimedia
, 2005
"... Automatically extracting semantic content from audio streams can be helpful in many multimedia applications. Motivated by the known limitations of traditional supervised approaches to content extraction, which are hard to generalize and require suitable training data, we propose in this paper an uns ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Automatically extracting semantic content from audio streams can be helpful in many multimedia applications. Motivated by the known limitations of traditional supervised approaches to content extraction, which are hard to generalize and require suitable training data, we propose in this paper an unsupervised approach to discover and categorize semantic content in a composite audio stream. In our approach, we first employ spectral clustering to discover natural semantic sound clusters in the analyzed data stream (e.g. speech, music, noise, applause, speech mixed with music, etc.). These clusters are referred to as audio elements. Based on the obtained set of audio elements, the key audio elements, which are most prominent in characterizing the content of input audio data, are selected and used to detect potential boundaries of semantic audio segments denoted as auditory scenes. Finally, the auditory scenes are categorized in terms of the audio elements appearing therein. Categorization is inferred from the relations between audio elements and auditory scenes by using the information-theoretic co-clustering scheme. Evaluations of the proposed approach performed on 4 hours of diverse audio data indicate that promising results can be achieved, both regarding audio element discovery and auditory scene categorization.
Extracting Semantics from Multimedia Content: Challenges and Solutions
"... Abstract Multimedia content accounts for over 60 % of traffic in the current internet [74]. With many users willing to spend their leisure time watching videos on YouTube or browsing photos through Flickr, sifting through large multimedia collections for useful information, especially those outside ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract Multimedia content accounts for over 60 % of traffic in the current internet [74]. With many users willing to spend their leisure time watching videos on YouTube or browsing photos through Flickr, sifting through large multimedia collections for useful information, especially those outside of the open web, is still an open problem. The lack of effective indexes to describe the content of multimedia data is a main hurdle to multimedia search, and extracting semantics from multimedia content is the bottleneck for multimedia indexing. In this chapter, we present a review on extracting semantics from a large amount of multimedia data as a statistical learning problem. Our goal is to present the current challenges and solutions from a few different perspectives and cover a sample of related work. We start with an system overview with the five major components that extracts and uses semantic metadata: data annotation, multimedia ontology, feature representation, model learning and retrieval systems. We then present challenges for each of the five components along with their existing solutions: designing multimedia lexicons and using them for concept detection, handling multiple media sources and resolving correspondence across modalities, learning structured (generative) models to account for natural data dependency or model hidden topics, handling rare classes, leveraging unlabeled data, scaling to large amounts of training data, and finally leveraging media semantics in retrieval systems. 1
Blind summarization: Content-adaptive video summarization using time-series analysis
- in Proc. of SPIE - Multimedia Content Analysis, Management, Retrieval
"... Severe complexity constraints on consumer electronic devices motivate us to investigate general-purpose video summarization techniques that are able to apply a common hardware setup to multiple content genres. On the other hand, we know that high quality summaries can only be produced with domain-sp ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Severe complexity constraints on consumer electronic devices motivate us to investigate general-purpose video summarization techniques that are able to apply a common hardware setup to multiple content genres. On the other hand, we know that high quality summaries can only be produced with domain-specific processing. In this paper, we present a time-series analysis based video summarization technique that provides a general core to which we are able to add small content-specific extensions for each genre. The proposed time-series analysis technique consists of unsupervised clustering of samples taken through sliding windows from the time series of features obtained from the content. We classify content into two broad categories, scripted content such as news and drama, and unscripted content such as sports and surveillance. The summarization problem then reduces to finding either finding semantic boundaries of the scripted content or detecting highlights in the unscripted content. The proposed technique is essentially an event detection technique and is thus best suited to unscripted content, however, we also find applications to scripted content. We thoroughly examine the trade-off between content-neutral and content-specific processing for effective summarization for a number of genres, and find that our core technique enables us to minimize the complexity of the content-specific processing and to postpone it to the final stage. We achieve the best results with unscripted content such as sports and surveillance video in terms of quality of summaries and minimizing content-specific processing. For other genres such as drama, we find that more content-specific processing is required. We also find that judicious choice of key audio-visual object detectors enables us to minimize the complexity of the content-specific processing while maintaining its applicability to a broad range of genres. We will present a demonstration of our proposed technique at the conference. 1.

