Results 1 - 10
of
22
Joint visual-text modeling for automatic retrieval of multimedia documents
- In MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia
, 2005
"... In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queri ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework. In the state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in significant improvement in performance over any single modality. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14 % improvement in IR performance over the best reported textonly baseline and ranks amongst the best results reported on this corpus.
Shrinking exponential language models
- In Proc. of HLT-NAACL
, 2009
"... In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In (Chen, 2009), we show that for a variety of language models belonging to the exponential family, the test set cross-entropy of a model can be accurately predicted from its training set cross-entropy and its parameter values. In this work, we show how this relationship can be used to motivate two heuristics for “shrinking ” the size of a language model to improve its performance. We use the first heuristic to develop a novel class-based language model that outperforms a baseline word trigram model by 28 % in perplexity and 1.9% absolute in speech recognition word-error rate on Wall Street Journal data. We use the second heuristic to motivate a regularized version of minimum discrimination information models and show that this method outperforms other techniques for domain adaptation. 1
The Philips/RWTH System for Transcription of Broadcast News
, 1999
"... This paper contains a description of the Philips/RWTH 1998 HUB4 system which has been build in a joint effort of Philips Research Laboratories Aachen and Aachen University of Technology. We will focus our discussion on recent improvements compared to the original 1997 HUB4 system and evaluate them o ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper contains a description of the Philips/RWTH 1998 HUB4 system which has been build in a joint effort of Philips Research Laboratories Aachen and Aachen University of Technology. We will focus our discussion on recent improvements compared to the original 1997 HUB4 system and evaluate them on the HUB4'97 evaluation data. The paper will deal with 1. a rough system overview including feature extraction, acoustic training, audio stream segmentation, and decoding 2. log-linear interpolation of distance-language models, 3. and the integration of various acoustic and language models via Discriminative Model Combination (DMC).
Robustness Issues in a Data-Driven Spoken Language Understanding System
- In HLT/NAACL04 Workshop on Spoken Language Understanding for Conversational Systems
, 2004
"... Robustness is a key requirement in spoken language understanding (SLU) systems. Human speech is often ungrammatical and ill-formed, and there will frequently be a mismatch between training and test data. This paper discusses robustness and adaptation issues in a statistically-based SLU system ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Robustness is a key requirement in spoken language understanding (SLU) systems. Human speech is often ungrammatical and ill-formed, and there will frequently be a mismatch between training and test data. This paper discusses robustness and adaptation issues in a statistically-based SLU system which is entirely data-driven. To test robustness, the system has been tested on data from the Air Travel Information Service (ATIS) domain which has been artificially corrupted with varying levels of additive noise. Although the speech recognition performance degraded steadily, the system did not fail catastrophically. Indeed, the rate at which the end-to-end performance of the complete system degraded was significantly slower than that of the actual recognition component.
Adaptation of Statistical Language Models for Automatic Speech Recognition
, 1999
"... Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic speech recognition, and it is this that forms the focus of this thesis. Most current speech recognition systems are dedicated to one specific task (for example, the recognition of broadcast news), and thus use a language model which has been trained on text which is appropriate to that task. If, however, one wants to perform recognition on more general language, then creating an appropriate language model is far from straightforward. A taskspecific language model will often perform very badly on language from a different domain, whereas a model trained on text from many diverse styles of language might perform better in general, but will not be especially well suited to any particular domai...
Generalized linear interpolation of language models
- IEEE Workshop on ASRU
, 2007
"... Despite the prevalent use of model combination techniques to improve speech recognition performance on domains with limited data, little prior research has focused on the choice of the actual interpolation model. For merging language models, the most popular approach has been the simple linear inter ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Despite the prevalent use of model combination techniques to improve speech recognition performance on domains with limited data, little prior research has focused on the choice of the actual interpolation model. For merging language models, the most popular approach has been the simple linear interpolation. In this work, we propose a generalization of linear interpolation that computes context-dependent mixture weights from arbitrary features. Results on a lecture transcription task yield up to a 1.0 % absolute improvement in recognition word error rate (WER). Index Terms — Language modeling, interpolation, adaptation, mixture models
Guessing partsof-speech of unknown words using global information
- In ACL
, 2006
"... In this paper, we present a method for guessing POS tags of unknown words using local and global information. Although many existing methods use only local information (i.e. limited window size or intra-sentential features), global information (extra-sentential features) provides valuable clues for ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we present a method for guessing POS tags of unknown words using local and global information. Although many existing methods use only local information (i.e. limited window size or intra-sentential features), global information (extra-sentential features) provides valuable clues for predicting POS tags of unknown words. We propose a probabilistic model for POS guessing of unknown words using global information as well as local information, and estimate its parameters using Gibbs sampling. We also attempt to apply the model to semisupervised learning, and conduct experiments on multiple corpora. 1
Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State model
, 2000
"... Abstract: A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The H ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract: A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure. When applied in extracting protein-protein interactions, we found that it performed better than other established statistical methods and achieved 61.5 % in F-score with balanced recall and precision values. Moreover, the statistical nature of the pure data-driven HVS model makes it intrinsically robust and it can be easily adapted to other domains.
Hybrid statistical and structural semantic modeling for Thai multi-stage spoken language understanding
- Workshop of HLT/NAACL 2004
"... This article proposes a hybrid statistical and structural semantic model for multi-stage spoken language understanding (SLU). The first stage of this SLU utilizes a weighted finite-state transducer (WFST)-based parser, which encodes the regular grammar of concepts to be extracted. The proposed metho ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This article proposes a hybrid statistical and structural semantic model for multi-stage spoken language understanding (SLU). The first stage of this SLU utilizes a weighted finite-state transducer (WFST)-based parser, which encodes the regular grammar of concepts to be extracted. The proposed method improves the regular grammar model by incorporating a well-known n-gram semantic tagger. This hybrid model thus enhances the syntax of n-gram outputs while providing robustness against speech-recognition errors. With applications to a Thai hotel reservation domain, it is shown to outperform both individual models at every stage of the SLU system. Under the probabilistic WFST framework, the use of N-best hypotheses from the speech recognizer instead of the 1best can further improve performance requiring only a small additional processing time. 1
Empirical Evaluation and Combination of Advanced Language Modeling Techniques
"... We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We present results obtained with several advanced language modeling techniques, including class based model, cache model, maximum entropy model, structured language model, random forest language model and several types of neural network based language models. We show results obtained after combining all these models by using linear interpolation. We conclude that for both small and moderately sized tasks, we obtain new state of the art results with combination of models, that is significantly better than performance of any individual model. Obtained perplexity reductions against Good-Turing trigram baseline are over 50 % and against modified Kneser-Ney smoothed 5-gram over 40%. Index Terms: language modeling, neural networks, model combination, speech recognition

