• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Speech-based annotation of heterogeneous multimedia content using automatic speech recognition (2007)

by M Huijbregts, R Ordelman, F de Jong
Add To MetaCart

Tools

Sorted by:
Results 1 - 2 of 2

NLP and the humanities: the revival of an old liaison

by Franciska De Jong
"... This paper present an overview of some emerging trends in the application of NLP in the domain of the so-called Digital Humanities and discusses the role and nature of metadata, the annotation layer that is so characteristic of documents that play a role in the scholarly practises of the humanities. ..."
Abstract - Add to MetaCart
This paper present an overview of some emerging trends in the application of NLP in the domain of the so-called Digital Humanities and discusses the role and nature of metadata, the annotation layer that is so characteristic of documents that play a role in the scholarly practises of the humanities. It is explained how metadata are the key to the added value of techniques such as text and link mining, and an outline is given of what measures could be taken to increase the chances for a bright future for the old ties between NLP and the humanities. There is no data like metadata! 1

SZTAKI @ TRECVID 2009 ∗

by Bálint Daróczy, Dávid Nemeskey, István Petrás, András A. Benczúr, Tamás Kiss , 2010
"... We summarize our fully automatic approach to the TRECVID 2009 Search task. Our submissions summarized in Table 1 use linear combinations of the following basic techniques. • text ASR text retrieved by the Dutch translation of selected topic terms. • image Similarity of representative frames of shots ..."
Abstract - Add to MetaCart
We summarize our fully automatic approach to the TRECVID 2009 Search task. Our submissions summarized in Table 1 use linear combinations of the following basic techniques. • text ASR text retrieved by the Dutch translation of selected topic terms. • image Similarity of representative frames of shots. • face Face detector output for topics involving people. • feature Total weight of high level feature classifiers considered relevant by text based similarity to the topic. We used the publicly available feature predictions. • motion Motion information extracted from videos where relevant to topic. • wide A variation of text with wider shot neighborhood considered relevant. • lattice Text retrieval based on ASR lattices where available. The combination of feature and face together contributed most to the performance of the system. In this experiment the use of lattices, although they were available only for part of the shots, did not improve over the most probable ASR output. The best ASR text based run is text + wide, a combination where more distant shots also receive partial score for a matching speech. We notice that the plain linear combination of all scores deteriorated performance. In the paper we measure independent performance of the methods and observe that feature alone would have outperformed all of our runs. An improved combination includes wide with lower weight. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University