Results 1  10
of
23
Approximating discrete probability distributions with dependence trees
 IEEE Transactions on Information Theory
, 1968
"... AbsfracfA method is presented to approximate optimally an ndimensional discrete probability distribution by a product of secondorder distributions, or the distribution of the firstorder tree dependence.The problem is to find an optimum set of n 1 first order dependence relationship among the n ..."
Abstract

Cited by 637 (0 self)
 Add to MetaCart
AbsfracfA method is presented to approximate optimally an ndimensional discrete probability distribution by a product of secondorder distributions, or the distribution of the firstorder tree dependence.The problem is to find an optimum set of n 1 first order dependence relationship among the n variables. It is shown that the procedure derived in this paper yields an approximation of a minimum difference in information. It is further shown that when this procedure is applied to empirical observations from an unknown distribution of tree dependence, the procedure is the maximumlikelihood estimate of the distribution.
Bayesian Network Classifiers
, 1997
"... Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restr ..."
Abstract

Cited by 589 (22 self)
 Add to MetaCart
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.
An algebra for probabilistic databases
"... An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly prob ..."
Abstract

Cited by 128 (1 self)
 Add to MetaCart
An algebra is presented for a simple probabilistic data model that may be regarded as an extension of the standard relational model. The probabilistic algebra is developed in such a way that (restricted to αacyclic database schemes) the relational algebra is a homomorphic image of it. Strictly probabilistic results are emphasized. Variations on the basic probabilistic data model are discussed. The algebra is used to explicate a commonly used statistical smoothing procedure and is shown to be potentially very useful for decision support with uncertain information.
Object Detection Using the Statistics of Parts
, 2004
"... In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers determines whether the object is present at a specified size within a fixedsize image window. To find the object at any location and size, these classifiers scan the image exhaustively. Each classifier is based on the statistics of localized parts. Each part is a transform from a subset of wavelet coefficients to a discrete set of values. Such parts are designed to capture various combinations of locality in space, frequency, and orientation. In building each classifier, we gathered the classconditional statistics of these part values from representative samples of object and nonobject images. We trained each classifier to minimize classification error on the training set by using Adaboost with ConfidenceWeighted Predictions (Shapire and Singer, 1999). In detection, each classifier computes the part values within the image window and looks up their associated classconditional probabilities. The classifier then makes a decision by applying a likelihood ratio test. For efficiency, the classifier evaluates this likelihood ratio in stages. At each stage, the classifier compares the partial likelihood ratio to a threshold and makes a decision about whether to cease evaluation—labeling the input as nonobject—or to continue further evaluation. The detector orders these stages of evaluation from a lowresolution to a highresolution search of the image. Our trainable object detector achieves reliable and efficient detection of human faces and passenger cars with outofplane rotation.
Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals
 J. Comput. Biol
, 2004
"... ..."
SetBased Bayesianism
, 1992
"... . Problems for strict and convex Bayesianism are discussed. A setbased Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique realvalued probability function in any decisionmaking context but also the re ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
. Problems for strict and convex Bayesianism are discussed. A setbased Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique realvalued probability function in any decisionmaking context but also the requirement of convexity for a setbased representation of uncertainty. Levi's Eadmissibility decision criterion is retained and is shown to be applicable in the nonconvex case. Keywords: Uncertainty, decisionmaking, maximum entropy, Bayesian methods. 1. Introduction. The reigning philosophy of uncertainty representation is strict Bayesianism. One of its central principles is that an agent must adopt a single, realvalued probability function over the events recognized as relevant to a given problem. Prescriptions for defining such a function for a given agent in a given situation range from the extreme personalism of deFinetti (1964, 1974) and Savage (1972) to the objective Bayesianism of...
Using FirstOrder Probability Logic for the Construction of Bayesian Networks
, 1993
"... We present a mechanism for constructing graphical models, specifically Bayesian networks, from a knowledge base of general probabilistic information. The unique feature of our approach is that it uses a powerful firstorder probabilistic logic for expressing the general knowledge base. This logic al ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We present a mechanism for constructing graphical models, specifically Bayesian networks, from a knowledge base of general probabilistic information. The unique feature of our approach is that it uses a powerful firstorder probabilistic logic for expressing the general knowledge base. This logic allows for the representation of a wide range of logical and probabilistic information. The model construction procedure we propose uses notions from direct inference to identify pieces of local statistical information from the knowledge base that are most appropriate to the particular event we want to reason about. These pieces are composed to generate a joint probability distribution specified as a Bayesian network. Although there are fundamental difficulties in dealing with fully general knowledge, our procedure is practical for quite rich knowledge bases and it supports the construction of a far wider range of networks than allowed for by current template technology. 1 Introduction The de...
A decomposition of classes via clustering to explain and improve naive bayes
 ECML 2003, volume 2837 of LNAI
, 2003
"... Abstract. We propose a method to improve the probability estimates made by Naive Bayes to avoid the effects of poor class conditional probabilities based on product distributions when each class spreads into multiple regions. Our approach is based on applying a clustering algorithm to each subset of ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract. We propose a method to improve the probability estimates made by Naive Bayes to avoid the effects of poor class conditional probabilities based on product distributions when each class spreads into multiple regions. Our approach is based on applying a clustering algorithm to each subset of examples that belong to the same class, and to consider each cluster as a class of its own. Experiments on 26 realworld datasets show a significant improvement in performance when the class decomposition process is applied, particularly when the mean number of clusters per class is large. 1
Compression, Information Theory and Grammars: A Unified Approach
 ACM Trans. on Information Systems
, 1990
"... : Text compression is of considerable theoretical and practical interest. It is, for example, becoming increasingly important for satisfying the requirements of fitting a large database onto a single CDROM. Many of the compression techniques discussed in the literature are model based. We here prop ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
: Text compression is of considerable theoretical and practical interest. It is, for example, becoming increasingly important for satisfying the requirements of fitting a large database onto a single CDROM. Many of the compression techniques discussed in the literature are model based. We here propose the notion of a formal grammar as a flexible model of text generation that encompasses most of the models offered before as well as, in principle, extending the possibility of compression to a much more general class of languages. Assuming a general model of text generation, a derivation is given of the well known Shannon entropy formula, making possible a theory of information based upon text representation rather than on communication. The ideas are shown to apply to a number of commonly used text models. Finally, we focus on a Markov model of text generation, suggest an information theoretic measure of similarity between two probability distributions, and develop a clustering algorith...
Statistical character structure modeling and its application to handwritten Chinese character recognition
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—This paper proposes a statistical character structure modeling method. It represents each stroke by the distribution of the feature points. The character structure is represented by the joint distribution of the component strokes. In the proposed model, the stroke relationship is effectivel ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract—This paper proposes a statistical character structure modeling method. It represents each stroke by the distribution of the feature points. The character structure is represented by the joint distribution of the component strokes. In the proposed model, the stroke relationship is effectively reflected by the statistical dependency. It can represent all kinds of stroke relationship effectively in a systematic way. Based on the character representation, a stroke neighbor selection method is also proposed. It measures the importance of a stroke relationship by the mutual information among the strokes. With such a measure, the important neighbor relationships are selected by the nth order probability approximation method. The neighbor selection algorithm reduces the complexity significantly because we can reflect only some important relationships instead of all existing relationships. The proposed character modeling method was applied to a handwritten Chinese character recognition system. Applying a modeldriven stroke extraction algorithm that cooperates with a selective matching algorithm, the proposed system is better than conventional structural recognition systems in analyzing degraded images.The effectiveness of the proposed methods was visualized by the experiments. The proposed method successfully detected and reflected the stroke relationships that seemed intuitively important. The overall recognition rate was 98.45 percent, which confirms the effectiveness of the proposed methods. Index Terms—Character recognition, statistical character structure modeling, modeldriven stroke extraction, selective matching, heuristic search. æ 1