Results 1 - 10
of
62,441
A comparison of event models for Naive Bayes text classification
, 1998
"... Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey ..."
Abstract
-
Cited by 1025 (26 self)
- Add to MetaCart
.g. Larkey and Croft 1996; Koller and Sahami 1997). Others use a multinomial model, that is, a uni-gram language model with integer word counts (e.g. Lewis and Gale 1994; Mitchell 1997). This paper aims to clarify the confusion by describing the differences and details of these two models, and by empirically
SRILM -- An extensible language modeling toolkit
- IN PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP 2002
, 2002
"... SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation ..."
Abstract
-
Cited by 1218 (21 self)
- Add to MetaCart
creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation
A Language Modeling Approach to Information Retrieval
, 1998
"... Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model. This sugg ..."
Abstract
-
Cited by 1154 (42 self)
- Add to MetaCart
an approach to retrieval based on probabilistic language modeling. We estimate models for each document individually. Our approach to modeling is non-parametric and integrates document indexing and document retrieval into a single model. One advantage of our approach is that collection statistics which
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract
-
Cited by 1224 (21 self)
- Add to MetaCart
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e
Class-Based n-gram Models of Natural Language
- Computational Linguistics
, 1992
"... We address the problem of predicting a word from previous words in a sample of text. In particular we discuss n-gram models based on calsses of words. We also discuss several statistical algoirthms for assigning words to classes based on the frequency of their co-occurrence with other words. We find ..."
Abstract
-
Cited by 986 (5 self)
- Add to MetaCart
We address the problem of predicting a word from previous words in a sample of text. In particular we discuss n-gram models based on calsses of words. We also discuss several statistical algoirthms for assigning words to classes based on the frequency of their co-occurrence with other words. We
The Lorel Query Language for Semistructured Data
- International Journal on Digital Libraries
, 1997
"... We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inapprop ..."
Abstract
-
Cited by 731 (29 self)
- Add to MetaCart
We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages
The Semantics of Predicate Logic as a Programming Language
- JOURNAL OF THE ACM
, 1976
"... Sentences in first-order predicate logic can be usefully interpreted as programs In this paper the operational and fixpoint semantics of predicate logic programs are defined, and the connections with the proof theory and model theory of logic are investigated It is concluded that operational seman ..."
Abstract
-
Cited by 808 (18 self)
- Add to MetaCart
Sentences in first-order predicate logic can be usefully interpreted as programs In this paper the operational and fixpoint semantics of predicate logic programs are defined, and the connections with the proof theory and model theory of logic are investigated It is concluded that operational
A Maximum Entropy approach to Natural Language Processing
- COMPUTATIONAL LINGUISTICS
, 1996
"... The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we des ..."
Abstract
-
Cited by 1366 (5 self)
- Add to MetaCart
describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
Logical foundations of object-oriented and frame-based languages
- JOURNAL OF THE ACM
, 1995
"... We propose a novel formalism, called Frame Logic (abbr., F-logic), that accounts in a clean and declarative fashion for most of the structural aspects of object-oriented and frame-based languages. These features include object identity, complex objects, inheritance, polymorphic types, query methods, ..."
Abstract
-
Cited by 876 (65 self)
- Add to MetaCart
that come from object-oriented programming have direct representation in F-logic; other, secondary aspects of this paradigm are easily modeled as well. The paper also discusses semantic issues pertaining to programming with a deductive object-oriented language based on a subset of F-logic.
Results 1 - 10
of
62,441