Results 1 - 10
of
30
An overview of audio information retrieval
, 1999
"... The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
How do people manage their digital photographs
, 2003
"... In this paper we present and discuss the findings of a study that investigated how people manage their collections of digital photographs. The six-month, 13-participant study included interviews, questionnaires, and analysis of usage statistics gathered from an instrumented digital photograph manage ..."
Abstract
-
Cited by 102 (2 self)
- Add to MetaCart
In this paper we present and discuss the findings of a study that investigated how people manage their collections of digital photographs. The six-month, 13-participant study included interviews, questionnaires, and analysis of usage statistics gathered from an instrumented digital photograph management tool called Shoebox. Alongside simple browsing features such as folders, thumbnails and timelines, Shoebox has some advanced multimedia features: contentbased image retrieval and speech recognition applied to voice annotations. Our results suggest that participants found their digital photos much easier to manage than their non-digital ones, but that this advantage was almost entirely due to the simple browsing features. The advanced features were not used very often and their perceived utility was low. These results should help to inform the design of improved tools for managing personal digital photographs.
A Fully Automated Content-Based Video Search Engine Supporting Spatiotemporal Queries
- IEEE Transactions on Circuits and Systems for Video Technology
, 1998
"... The rapidity with which digital information, particularly video, is being generated has necessitated the development of tools for efficient search of these media. Content-based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive system ..."
Abstract
-
Cited by 85 (4 self)
- Add to MetaCart
The rapidity with which digital information, particularly video, is being generated has necessitated the development of tools for efficient search of these media. Content-based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive system on the Web, based on the visual paradigm, with spatiotemporal attributes playing a key role in video retrieval. We have developed innovative algorithms for automated video object segmentation and tracking, and use real-time video editing techniques while responding to user queries. The resulting system, called VideoQ (demo available at http://www.ctr.columbia.edu/VideoQ/), is the first on-line video search engine supporting automatic objectbased indexing and spatiotemporal queries. The system performs well, with the user being able to retrieve complex video clips such as those of skiers and baseball players with ease. Index Terms---Content based, information retreival, object oriented, spat...
SpeechSkimmer: A System for Interactively Skimming Recorded Speech
- ACM Transactions on Computer Human Interaction
, 1997
"... Note that the text that appeared in printed journal contains very minor typographic and grammatical corrections that do not appear in this version. SpeechSkimmer: ..."
Abstract
-
Cited by 85 (1 self)
- Add to MetaCart
Note that the text that appeared in printed journal contains very minor typographic and grammatical corrections that do not appear in this version. SpeechSkimmer:
Piconet: Embedded Mobile Networking
- IEEE Personal Communications
, 1997
"... Piconet is a general-purpose, low-power ad hoc radio network. It provides a base level of connectivity to even the simplest of sensing and computing objects. It is our intention that a full range of portable and embedded devices may make use of this connectivity. This article outlines the Piconet ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
Piconet is a general-purpose, low-power ad hoc radio network. It provides a base level of connectivity to even the simplest of sensing and computing objects. It is our intention that a full range of portable and embedded devices may make use of this connectivity. This article outlines the Piconet system, under development at the Olivetti and Oracle Research Laboratory (ORL). The authors discuss the motivation for providing this low-level "embedded networking," and describe their experiences of building such a system. The article concludes with a commentary on some of the implications that power saving, and other considerations central to Piconet, have on the design of the system.
VideoQ: An Automated Content Based Video Search System Using Visual Cues
- In Proceedings of ACM Multimedia
, 1997
"... The rapidity with which digital information, particularly video, is being generated, has necessitated the development of tools for efficient search of these media. Content based visual queries have been primarily focussed on still image retrieval. In this paper, we propose a novel, real-time, intera ..."
Abstract
-
Cited by 75 (1 self)
- Add to MetaCart
The rapidity with which digital information, particularly video, is being generated, has necessitated the development of tools for efficient search of these media. Content based visual queries have been primarily focussed on still image retrieval. In this paper, we propose a novel, real-time, interactive system on the Web, based on the visual paradigm, with spatio-temporal attributes playing a key role in video retrieval. We have developed algorithms for automated video object segmentation and tracking and use real-time video editing techniques while responding to user queries. The resulting system performs well, with the user being able to retrieve complex video clips such as those of skiers, baseball players, with ease. 1. Introduction The ease of capture and encoding of digital images has caused a massive amount of visual information to be produced and disseminated rapidly. Hence efficient tools and systems for searching and retrieving visual information are needed. While there are...
Lattice-based search for spoken utterance retrieval
- In Proceedings of HLT-NAACL 2004
, 2004
"... Recent work on spoken document retrieval has suggested that it is adequate to take the singlebest output of ASR, and perform text retrieval on this output. This is reasonable enough for the task of retrieving broadcast news stories, where word error rates are relatively low, and the stories are long ..."
Abstract
-
Cited by 32 (8 self)
- Add to MetaCart
Recent work on spoken document retrieval has suggested that it is adequate to take the singlebest output of ASR, and perform text retrieval on this output. This is reasonable enough for the task of retrieving broadcast news stories, where word error rates are relatively low, and the stories are long enough to contain much redundancy. But it is patently not reasonable if one’s task is to retrieve a short snippet of speech in a domain where WER’s can be as high as 50%; such would be the situation with teleconference speech, where one’s task is to find if and when a participant uttered a certain phrase. In this paper we propose an indexing procedure for spoken utterance retrieval that works on lattices rather than just single-best text. We demonstrate that this procedure can improve F scores by over five points compared to singlebest retrieval on tasks with poor WER and low redundancy. The representation is flexible so that we can represent both word lattices, as well as phone lattices, the latter being important for improving performance when searching for phrases containing OOV words. 1
A Fully Automated Content Based Video Search Engine Supporting Spatio-Temporal Queries
, 1997
"... The rapidity with which digital information, particularly video, is being generated, has necessitated the development of tools for efficient search of these media. Content based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive system ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
The rapidity with which digital information, particularly video, is being generated, has necessitated the development of tools for efficient search of these media. Content based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive system on the Web, based on the visual paradigm, with spatio-temporal attributes playing a key role in video retrieval. We have developed innovative algorithms for automated video object segmentation and tracking and use real-time video editing techniques while responding to user queries. The resulting system called VideoQ 1 is the first on-line video search engine supporting automatic object based indexing and spatio-temporal queries. The system performs well, with the user being able to retrieve complex video clips such as those of skiers, baseball players, with ease. 1 Demo available at http://www.ctr.columbia.edu/VideoQ/; A shorter version of this paper appeared at the ACM Conference on Mul...
Multimedia content processing through cross-modal association
- In MULTIMEDIA ’03: Proceedings of the eleventh ACM international conference on Multimedia
, 2003
"... Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. We investigate different crossmodal association methods using the linear correlation model. We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA). Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that use offline supervised training. As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail. Different association methods are tested and compared using the proposed cross-modal retrieval system. All these methods achieve significant dimensionality reduction. Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads. The CFA method achieves 91.1 % detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 66.1 % and 73.9 % accuracy, respectively. As shown by experiments, crossmodal association provides many useful benefits, such as robust noise resistance and effective feature selection. Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage. Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications.
Next-Generation Content Representation, Creation and Searching for New Media Applications in Education
, 1998
"... Content creation, editing, and searching are extremely time consuming tasks that often require substantial training and experience, especially when high-quality audio and video are involved. "New media" represents a new paradigm for multimedia information representation and processing, in which the ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Content creation, editing, and searching are extremely time consuming tasks that often require substantial training and experience, especially when high-quality audio and video are involved. "New media" represents a new paradigm for multimedia information representation and processing, in which the emphasis is placed on the actual content. It thus brings the tasks of content creation and searching much closer to actual users and enables them to be active producers of audiovisual information rather than passive recipients. We discuss the state-of-the-art and present next-generation techniques for content representation, searching, creation, and editing. We discuss our experiences in developing a Web-based distributed compressed video editing and searching system (WebClip), a media representation language (Flavor) and an object-based video authoring system (Zest) based on it, and large image/video search engines for the World-Wide Web (WebSEEk and VideoQ). We also present a case study of new media applications based on specific planned multimedia education experiments with the above systems in several K-12 schools in Manhattan.

