2

A comparison of two smoothing methods for word bigram models
– L C Bauman Peto
 1994

1

Hierarchical Dirichlet Language Model 19
– R Hanson, J Stutz, P Cheeseman
 1991

1

Density networks and protein modelling
– D J C MacKay
 1995

7

Text Compression, Englewood Cliffs
– T C Bell, J G Cleary, I H Witten
 1990

13

A fast algorithm for deleted interpolation
– Lalit R Bahl, Peter F Brown, Peter V de Souza, Robert L Mercer, David Nahamoo
 1991

32

Bayesian classification with correlation and inheritance
– Robin Hanson, John Stutz, Peter Cheeseman
 1991

35

Estimation of probabilities m the language model of the IBM speech recognition system
– A Nadas
 1984

18

Hyperparameters: optimize, or integrate out?
– David J. C. MacKay
 1996

73

Developments in Maximum entropy data analysis
– S GULL
 1989

28

Bayesian Mixture Modeling
– R Neal
 1992

39

Bayesian Neural Networks and Density Networks
– David J.C. MacKay
 1994

1176

The Mathematics of Statistical Machine Translation: Parameter Estimation
– Peter F. Brown, Vincent J.Della Pietra, Stephen A. Della Pietra, Robert. L. Mercer
 1993

414

A program for aligning sentences in bilingual corpora
– William A. Gale, Kenneth W. Church
 1993

567

Probabilistic Inference Using Markov Chain Monte Carlo Methods
– Radford M. Neal
 1993

663

Estimation of probabilities from sparse data for the language model component of a speech recognizer
– Slava M. Katz
 1987

619

Text Compression
– T Bell, J Cleary, I Witten
 1990

392

A maximum likelihood approach to continuous speech recognition
– L R Bahl, F Jelinek, R L Mercer

338

Interpolated estimation of markov source parameters from sparse data,” Pattern Recognit. Practice
– F Jelinek, E L Mercer
 1980

158

Probability, frequency, and reasonable expectation
– R T Cox
 1946
