Results 11 -
13 of
13
LANGUAGE MODEL ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION AND STATISTICAL MACHINE TRANSLATION
, 2004
"... Language modeling is critical and indispensable for many natural language ap-plications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore stati ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Language modeling is critical and indispensable for many natural language ap-plications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore statistical techniques have been dominant for language modeling over the last few decades. All statistical modeling techniques, in principle, work under some conditions: 1) a reasonable amount of training data is available and 2) the training data comes from the same population as the test data to which we want to apply our model. Based on observations from the training data, we build statistical models and therefore, the success of a statistical model is crucially dependent on the training data. In other words, if we don’t have enough data for training, or the training data is not matched with the test data, we are not able to build accurate statistical models. This thesis presents novel methods to cope with those problems in language modeling—language model adaptation.
A Fast Approximate Algorithm for Large-Scale Latent Semantic Indexing
"... Latent Semantic Indexing (LSI) is an effective method to discover the underlying semantic structure of data. It has numerous applications in information retrieval and data mining. However, the computational complexity of LSI may be prohibitively high when applied to very large datasets. In this pape ..."
Abstract
- Add to MetaCart
Latent Semantic Indexing (LSI) is an effective method to discover the underlying semantic structure of data. It has numerous applications in information retrieval and data mining. However, the computational complexity of LSI may be prohibitively high when applied to very large datasets. In this paper, we present a fast approximate algorithm for large-scale LSI that is conceptually simple and theoretically justified. Our main contribution is to show that the proposed algorithm has provable error bound and linear computational complexity. 1
Straightforward Feature Selection for Scalable Latent Semantic Indexing
"... Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large scale text corpus due to its high computational complexity. In this paper, we propose a straightforward feature selection st ..."
Abstract
- Add to MetaCart
Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large scale text corpus due to its high computational complexity. In this paper, we propose a straightforward feature selection strategy, which is named as Feature Selection for Latent Semantic Indexing (FSLSI), as a preprocessing step such that LSI can be efficiently approximated on large scale text corpus. We formulate LSI as a continuous optimization problem and propose to optimize its objective function in terms of discrete optimization, which leads to the FSLSI algorithm. We show that the closed form solution of this optimization is as simple as scoring each feature by Frobenius norm and filter out the ones with small scores. Theoretical analysis guarantees the loss of the features filtered out by FSLSI algorithm is minimized for approximating LSI. Thus we offer a general way for studying and applying LSI on large scale corpus. The large scale study on more than 1 million TREC documents shows the effectiveness of FSLSI in Information Retrieval (IR) tasks. 1.

