Results 1  10
of
15
Learning Bayesian networks: The combination of knowledge and statistical data
 Machine Learning
, 1995
"... We describe scoring metrics for learning Bayesian networks from a combination of user knowledge and statistical data. We identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simpl ..."
Abstract

Cited by 901 (35 self)
 Add to MetaCart
We describe scoring metrics for learning Bayesian networks from a combination of user knowledge and statistical data. We identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simplify the encoding of a user’s prior knowledge. In particular, a user can express his knowledge—for the most part—as a single prior Bayesian network for the domain. 1
A Tutorial on Learning Bayesian Networks
 Communications of the ACM
, 1995
"... We examine a graphical representation of uncertain knowledge called a Bayesian network. The representation is easy to construct and interpret, yet has formal probabilistic semantics making it suitable for statistical manipulation. We show how we can use the representation to learn new knowledge by c ..."
Abstract

Cited by 297 (12 self)
 Add to MetaCart
We examine a graphical representation of uncertain knowledge called a Bayesian network. The representation is easy to construct and interpret, yet has formal probabilistic semantics making it suitable for statistical manipulation. We show how we can use the representation to learn new knowledge by combining domain knowledge with statistical data. 1 Introduction Many techniques for learning rely heavily on data. In contrast, the knowledge encoded in expert systems usually comes solely from an expert. In this paper, we examine a knowledge representation, called a Bayesian network, that lets us have the best of both worlds. Namely, the representation allows us to learn new knowledge by combining expert domain knowledge and statistical data. A Bayesian network is a graphical representation of uncertain knowledge that most people find easy to construct and interpret. In addition, the representation has formal probabilistic semantics, making it suitable for statistical manipulation (Howard,...
A Bayesian approach to learning causal networks
 In Uncertainty in AI: Proceedings of the Eleventh Conference
, 1995
"... Whereas acausal Bayesian networks represent probabilistic independence, causal Bayesian networks represent causal relationships. In this paper, we examine Bayesian methods for learning both types of networks. Bayesian methods for learning acausal networks are fairly well developed. These methods oft ..."
Abstract

Cited by 57 (11 self)
 Add to MetaCart
Whereas acausal Bayesian networks represent probabilistic independence, causal Bayesian networks represent causal relationships. In this paper, we examine Bayesian methods for learning both types of networks. Bayesian methods for learning acausal networks are fairly well developed. These methods often employ assumptions to facilitate the construction of priors, including the assumptions of parameter independence, parameter modularity, and likelihood equivalence. We show that although these assumptions also can be appropriate for learning causal networks, we need additional assumptions in order to learn causal networks. We introduce two sufficient assumptions, called mechanism independence and component independence. We show that these new assumptions, when combined with parameter independence, parameter modularity, and likelihood equivalence, allow us to apply methods for learning acausal networks to learn causal networks. 1
Using Unlabeled Data to Improve Text Classification
, 2001
"... One key difficulty with text classification learning algorithms is that they require many handlabeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create high ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
One key difficulty with text classification learning algorithms is that they require many handlabeled examples to learn accurately. This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create highaccuracy text classifiers. By assuming that documents are created by a parametric generative model, ExpectationMaximization (EM) finds local maximum a posteriori models and classifiers from all the data  labeled and unlabeled. These generative models do not capture all the intricacies of text; however on some domains this technique substantially improves classification accuracy, especially when labeled data are sparse. Two problems arise from this basic approach. First, unlabeled data can hurt performance in domains where the generative modeling assumptions are too strongly violated. In this case the assumptions can be made more representative in two ways: by modeling subtopic class structure, and by modeling supertopic hierarchical class relationships. By doing so, model probability and classification accuracy come into correspondence, allowing unlabeled data to improve classification performance. The second problem is that even with a representative model, the improvements given by unlabeled data do not sufficiently compensate for a paucity of labeled data. Here, limited labeled data provide EM initializations that lead to lowprobability models. Performance can be significantly improved by using active learning to select highquality initializations, and by using alternatives to EM that avoid lowprobability local maxima.
Learning Probabilistic Networks
 THE KNOWLEDGE ENGINEERING REVIEW
, 1998
"... A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combini ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combining prior knowledge, which might be limited solely to experience of the influences between some of the variables of interest, and data. In this paper, we first show how data can be used to revise initial estimates of the parameters of a model. We then progress to showing how the structure of the model can be revised as data is obtained. Techniques for learning with incomplete data are also covered.
A ContiguityEnhanced KMeans Clustering Algorithm for Unsupervised Multispectral Image Segmentation
, 1997
"... The recent and continuing construction of multi and hyperspectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security intere ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
The recent and continuing construction of multi and hyperspectral imagers will provide detailed data cubes with information in both the spatial and spectral domain. This data shows great promise for remote sensing applications ranging from environmental and agricultural to national security interests. The reduction of this voluminous data to useful intermediate forms is necessary both for downlinking all those bits and for interpreting them. Smart onboard hardware is required, as well as sophisticated earthbound processing. A segmented image (in which the multispectral data in each pixel is classified into one of a small number of categories) is one kind of intermediate form which provides some measure of data compression. Traditional image segmentation algorithms treat pixels independently and cluster the pixels according only to their spectral information. This neglects the implicit spatial information that is available in the image. We will suggest a simple approach  a varian...
Constructing Bayesian finite mixture models by the EM algorithm
, 1997
"... In this paper we explore the use of finite mixture models for building decision support systems capable of sound probabilistic inference. Finite mixture models have many appealing properties: they are computationally efficient in the prediction (reasoning) phase, they are universal in the sense that ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
In this paper we explore the use of finite mixture models for building decision support systems capable of sound probabilistic inference. Finite mixture models have many appealing properties: they are computationally efficient in the prediction (reasoning) phase, they are universal in the sense that they can approximate any problem domain distribution, and they can handle multimodality well. We present a formulation of the model construction problem in the Bayesian framework for finite mixture models, and describe how Bayesian inference is performed given such a model. The model construction problem can be seen as missing data estimation and we describe a realization of the ExpectationMaximization (EM) algorithm for finding good models. To prove the feasibility of our approach, we report crossvalidated empirical results on several publicly available classification problem datasets, and compare our results to corresponding results obtained by alternative techniques, such as neural netw...
Exploiting parameter domain knowledge for learning in Bayesian networks
 Carnegie Mellon University
, 2005
"... implied, of any sponsoring institution, the U.S. government or any other entity. ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
implied, of any sponsoring institution, the U.S. government or any other entity.
Using Unlabelled Data To Update Classification Rules With Applications In Food Authenticity Studies
, 2004
"... Programme for Industrial Development and the FIRM programme. A classification method is developed to classify samples when both labelled and unlabelled samples are available. The classification rule is estimated using both the labelled and unlabelled data, in contrast to many classical methods which ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Programme for Industrial Development and the FIRM programme. A classification method is developed to classify samples when both labelled and unlabelled samples are available. The classification rule is estimated using both the labelled and unlabelled data, in contrast to many classical methods which only use the labelled data for estimation. This methodology models the data as arising from a Gaussian mixture model with parsimonious covariance structure, as is done in modelbased clustering (Fraley and Raftery (2002)). A missingdata formulation of the mixture model is used and the models are fitted using the EM and CEM algorithms. A comparison of the performance of modelbased discriminant analysis and the proposed method of classification is given. The methods are applied to the analysis of spectra of foodstuffs recorded over the visible and nearinfrared wavelength range in food authenticity studies. The aim of this study is to classify the foodstuffs using their spectra. The proposed classification method is shown
Codesign of Software and Hardware to Implement Remote Sensing Algorithms
 Proc. SPIE
, 2001
"... Both for o#ine searches through large data archives and for onboard computation at the sensor head, there is a growing need for evermore rapid processing of remote sensing data. For many algorithms of use in remote sensing, the bulk of the processing takes place in an "inner loop" with a large numb ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Both for o#ine searches through large data archives and for onboard computation at the sensor head, there is a growing need for evermore rapid processing of remote sensing data. For many algorithms of use in remote sensing, the bulk of the processing takes place in an "inner loop" with a large number of simple operations. For these algorithms, dramatic speedups can often be obtained with specialized hardware. The di#culty and expense of digital design continues to limit applicability of this approach, but the development of new design tools is making this approach more feasible, and some notable successes have been reported. On the other hand, it is often the case that processing can also be accelerated by adopting a more sophisticated algorithm design. Unfortunately, a more sophisticated algorithm is much harder to implement in hardware, so these approaches are often at odds with each other. With careful planning, however, it is sometimes possible to combine software and hardware design in such a way that each complements the other, and the final implementation achieves speedup that would not have been possible with a hardwareonly or a softwareonly solution. We will in particular discuss the codesign of software and hardware to achieve substantial speedup of algorithms for multispectral image segmentation and for endmember identification.