Results 1  10
of
857
SRILM  An extensible language modeling toolkit
 IN PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP 2002
, 2002
"... SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation ..."
Abstract

Cited by 760 (17 self)
 Add to MetaCart
SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on Ngram statistics, as well as several related tasks, such as statistical tagging and manipulation of Nbest lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 564 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
A hierarchical phrasebased model for statistical machine translation
 In ACL
, 2005
"... We present a statistical phrasebased translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous contextfree grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of ..."
Abstract

Cited by 364 (10 self)
 Add to MetaCart
We present a statistical phrasebased translation model that uses hierarchical phrases— phrases that contain subphrases. The model is formally a synchronous contextfree grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntaxbased translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical phrasebased model achieves a relative improvement of 7.5 % over Pharaoh, a stateoftheart phrasebased system. 1
TimeBased Language Models
, 2003
"... We explore the relationship between time and relevance using TREC adhoc queries. A type of query is identified that favors very recent documents. We propose a timebased language model approach to retrieval for these queries. We show how time can be incorporated into both querylikelihood models an ..."
Abstract

Cited by 321 (33 self)
 Add to MetaCart
We explore the relationship between time and relevance using TREC adhoc queries. A type of query is identified that favors very recent documents. We propose a timebased language model approach to retrieval for these queries. We show how time can be incorporated into both querylikelihood models and relevance models. We carried out experiments to compare timebased language models to heuristic techniques for incorporating document recency in the ranking. Our results show that timebased models perform as well as or better than the best of the heuristic techniques.
A Gaussian Prior for Smoothing Maximum Entropy Models
, 1999
"... In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood training for exponential models, and like other maximum likelihood methods is prone to overfitting of training data. Several smoothing methods for maximum entropy models have been proposed to address this problem, ..."
Abstract

Cited by 226 (2 self)
 Add to MetaCart
In certain contexts, maximum entropy (ME) modeling can be viewed as maximum likelihood training for exponential models, and like other maximum likelihood methods is prone to overfitting of training data. Several smoothing methods for maximum entropy models have been proposed to address this problem, but previous results do not make it clear how these smoothing methods compare with smoothing methods for other types of related models. In this work, we survey previous work in maximum entropy smoothing and compare the performance of several of these algorithms with conventional techniques for smoothing ngram language models. Because of the mature body of research in ngram model smoothing and the close connection between maximum entropy and conventional ngram models, this domain is wellsuited to gauge the performance of maximum entropy smoothing methods. Over a large number of data sets, we find that an ME smoothing method proposed to us by Lafferty [1] performs as well as or better tha...
Automating the Construction of Internet Portals with Machine Learning
 Information Retrieval
, 2000
"... Domainspecific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible ..."
Abstract

Cited by 169 (3 self)
 Add to MetaCart
Domainspecific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible with general, Webwide search engines. Unfortunately these portals are difficult and timeconsuming to maintain. This paper advocates the use of machine learning techniques to greatly automate the creation and maintenance of domainspecific Internet portals. We describe new research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies. Using these techniques, we have built a demonstration system: a portal for computer science research papers. It already contains over 50,000 papers and is publicly available at www.cora.justresearch.com. These techniques are ...
Learning to resolve natural language ambiguities: A unified approach
 In Proceedings of the National Conference on Artificial Intelligence. 806813. Segond F., Schiller A., Grefenstette & Chanod F.P
, 1998
"... distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context word associations and synand machine learning algorithms for natural language tactic patterns in this case are suffl ..."
Abstract

Cited by 166 (82 self)
 Add to MetaCart
distinct semanticonceptsuch as interest rate and has interest in Math are conflated in ordinary text. We analyze a few of the commonly used statistics based The surrounding context word associations and synand machine learning algorithms for natural language tactic patterns in this case are sufflcicnt to identify disambiguation tasks and observe tha they can bc recast as learning linear separators in the feature space. the correct form. Each of the methods makes a priori assumptions, which Many of these arc important standalone problems it employs, given the data, when searching for its hy but even more important is thei role in many applicapothesis. Nevertheless, as we show, it searches a space tions including speech recognition, machine translation, that is as rich as the space of all linear separators. information extraction and intelligent humanmachine We use this to build an argument for a data driven interaction. Most of the ambiguity resolution problems approach which merely searches for a good linear sepa are at the lower level of the natural language inferences rator in the feature space, without further assumptions chain; a wide range and a large number of ambigui
Two decades of statistical language modeling: Where do we go from here
 Proceedings of the IEEE
, 2000
"... Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here ..."
Abstract

Cited by 149 (1 self)
 Add to MetaCart
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rulebased approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.