## Fusion of Domain Knowledge with Data for Structural Learning in Object Oriented Domains (2003)

### Cached

### Download Links

- [www.math.ntnu.no]
- [www.idi.ntnu.no]
- [www.cs.auc.dk]
- [www.math.ntnu.no]
- [www.jmlr.org]
- [www.idi.ntnu.no]
- [www.jmlr.org]
- [people.cs.aau.dk]
- [jmlr.org]
- [www.ai.mit.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 9 - 0 self |

### BibTeX

@MISC{Langseth03fusionof,

author = {Helge Langseth and Thomas D. Nielsen and Richard Dybowski},

title = {Fusion of Domain Knowledge with Data for Structural Learning in Object Oriented Domains},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

When constructing a Bayesian network, it can be advantageous to employ structural learning algorithms to combine knowledge captured in databases with prior information provided by domain experts. Unfortunately, conventional learning algorithms do not easily incorporate prior information, if this information is too vague to be encoded as properties that are local to families of variables. For instance, conventional algorithms do not exploit prior information about repetitive structures, which are often found in object oriented domains such as computer networks, large pedigrees and genetic analysis.

### Citations

9231 |
Elements of Information Theory
- Cover, Thomas
- 1990
(Show Context)
Citation Context ...ivergence is defined as D # f ||sf N # = x f (x|Q) log # f (x|Q)sf N (x|sF N ) # . There are many arguments for using this particular measurement for calculating the quality of the approximation (see =-=Cover and Thomas, 1991-=-). One of them is the fact that the KL divergence bound the maximum error in the assessed probability for a particular event A, (Whittaker, 1990, Proposition 4.3.7): sup A # # # # # x#A f (x | Q)- x#A... |

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ... used when approximating the integral in Equation 1. Thus, in order to apply these approximations we need to find the MAP parameters, for example by using the expectation-maximization (EM) algorithm (=-=Dempster et al., 1977-=-; Green, 1990), before we can calculate the score of a model. Thus, for each candidate model we may need to invest a considerable amount of time in order to evaluate the model. As an alternative, Frie... |

7493 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...incorporating prior information provided by domain experts. Keywords: Bayesian networks, structural learning, object orientation, knowledge fusion 1. Introduction The Bayesian network (BN) framework (=-=Pearl, 1988-=-; Jensen, 1996, 2001) has established itself as a powerful tool in many areas of artificial intelligence. However, eliciting a BN from a domain expert can be a laborious and time consuming process. Th... |

2771 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...ihood of the data. In such situations, a common approach is to apply asymptotic approximations such as the Laplace approximation, (see, for example, Ripley, 1996), the Bayesian Information Criterion (=-=Schwarz, 1978-=-), the Minimum Description Length (Rissanen, 1987) or the CheesemanStutz approximation (Cheeseman and Stutz, 1996), see also Chichering and Heckerman (1997) for a discussion. These approximations assu... |

2640 |
Density estimation for statistical and data analysis. Monographs on statistics and applied probability
- Silverman
- 1986
(Show Context)
Citation Context ... Structural OO Learning The goal of our learning algorithm is to find a good estimate of the unknown underlying statistical distribution function. That is, we focus on the task of density estimation (=-=Silverman, 1986-=-). Note that if focus had been on, for example, causal discovery (Heckerman, 1995a), classification (Fried9. Note that this approach can be seen as a generalization of the method for parameter learnin... |

1609 |
Statistical Analysis with Missing Data
- Little, Rubin
- 1987
(Show Context)
Citation Context ...re avoid the computational expensive step of calculating the MAP parameters for each candidate model). The validity of the SEM algorithm is based on the assumption that the data is missing at random (=-=Little and Rubin, 1987-=-), which is also assumed in the remainder of this paper. Informally, this means that the pattern of missingness may only depend on the values of the observed variables. 7 The SEM algorithm maximizes P... |

1421 |
On information and sufficiency
- Kullback, Leibler
- 1951
(Show Context)
Citation Context ...measure for the difference between the gold standard model and the estimated model is required. In the tests performed, we have measured this difference by using the Kullback-Leibler (KL) divergence (=-=Kullback and Leibler, 1951-=-) between the gold standard model and the estimated model. The KL divergence is defined as D # f ||sf N # = x f (x|Q) log # f (x|Q)sf N (x|sF N ) # . There are many arguments for using this particular... |

1212 | Pattern Recognition and Neural Networks - Ripley - 1996 |

1140 | A Bayesian Method for the Induction of Probabilistic Networks from Data - Cooper, Herskovits - 1992 |

981 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ... prior information provided by domain experts. Keywords: Bayesian networks, structural learning, object orientation, knowledge fusion 1. Introduction The Bayesian network (BN) framework (Pearl, 1988; =-=Jensen, 1996-=-, 2001) has established itself as a powerful tool in many areas of artificial intelligence. However, eliciting a BN from a domain expert can be a laborious and time consuming process. Thus, methods fo... |

953 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...= 1 (Cooper and Herskovits, 1991), and r(X i , P i ) = # n-1 | P i | # -1 351 LANGSETH AND NIELSEN used by Friedman and Koller (2003). Another prior which is frequently used is r(X i , P i ) = k d i (=-=Heckerman et al., 1995-=-), where 0sk # 1 and d i = | {P i (B S ) #P i (B p )} \{P i (B S ) #P i (B p )} | denotes the number of parents for X i that differs in the prior model B p and the candidate structure B S . Thus, each... |

903 | A Tutorial on Learning With Bayesian Networks
- Heckerman
- 1995
(Show Context)
Citation Context ...mate of the unknown underlying statistical distribution function. That is, we focus on the task of density estimation (Silverman, 1986). Note that if focus had been on, for example, causal discovery (=-=Heckerman, 1995-=-a), classification (Fried9. Note that this approach can be seen as a generalization of the method for parameter learning in DBNs, see West and Harrison (1997). 354 KNOWLEDGE FUSION FOR STRUCTURAL LEAR... |

670 | Probabilistic Networks and Expert Systems - Cowell, Dawid, et al. - 1999 |

660 | Bayesian networks and Decision Graphs - Jensen - 2001 |

638 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...e possible use of encapsulating classes, we give an OOBN representation of the insurance network by Binder et al. (1997). The insurance network, depicted in Figure 5, is taken from The BN repository (=-=Friedman et al., 1997-=-b). The network, which consists of 27 nodes, is designed for classifying car insurance applications based on the expected claim cost. This information is captured in the nodes PropCost (Property cost)... |

533 | Learning probabilistic relational models - Friedman, Getoor, et al. - 1999 |

515 |
Bayesian Classification (AutoClass): Theory and Results
- Cheeseman, Stutz
- 1996
(Show Context)
Citation Context ...s the Laplace approximation, (see, for example, Ripley, 1996), the Bayesian Information Criterion (Schwarz, 1978), the Minimum Description Length (Rissanen, 1987) or the CheesemanStutz approximation (=-=Cheeseman and Stutz, 1996-=-), see also Chichering and Heckerman (1997) for a discussion. These approximations assume that the posterior over the parameters is peaked, and the maximum a posteriori (MAP) parameters are used when ... |

467 |
Graphical models in applied multivariate statistics
- Whittaker
- 1990
(Show Context)
Citation Context ...alculating the quality of the approximation (see Cover and Thomas, 1991). One of them is the fact that the KL divergence bound the maximum error in the assessed probability for a particular event A, (=-=Whittaker, 1990-=-, Proposition 4.3.7): sup A # # # # # x#A f (x | Q)- x#Asf N (x|sF N ) # # # # # # # 1 2sD # f ||sf N # . Similar result for the maximal error of the estimated conditional distribution is derived by v... |

327 | Bayesian Forecasting and Dynamic Models - West, Harrison - 1989 |

245 | Learning Bayesian networks with local structure - Friedman, Goldszmidt - 1996 |

229 | Learning the structure of dynamic probabilistic networks
- Friedman, Murphy, et al.
- 1998
(Show Context)
Citation Context ...dman (1998) proves that by increasing 7. An active research area within the learning community is the discovery of hidden variables. These types of variables are never observed (Spirtes et al., 1993; =-=Friedman et al., 1998-=-; Elidan et al., 2000; Elidan and Friedman, 2001), however, hidden variables will not be considered further in this paper. 349 LANGSETH AND NIELSEN the expected score at each iteration we always obtai... |

223 | The Bayesian Structural EM Algorithm - Friedman - 1998 |

218 | Being Bayesian about Network Structure - Friedman, Koller |

199 | Learning Bayesian belief networks: An approach based on the MDL principle - Lam, Bacchus - 1994 |

192 | Object-oriented bayesian networks - Koller, D, et al. - 1997 |

179 | A guide to the literature on learning probabilistic networks from data - Buntine - 1996 |

162 | Adaptive Probabilistic Networks with Hidden Variables - Binder, Koller, et al. - 1997 |

111 | Probabilistic classification and clustering in relational data - Taskar, Segal, et al. - 2001 |

101 | Knowledge Engineering for Large Belief Networks - Pradhan, Provan, et al. - 1994 |

88 | Network fragments: Representing knowledge for constructing probabilistic models - Laskey, Mahoney - 1997 |

80 | Justifying multiply sectioned Bayesian networks
- Xiang, Lesser
- 2000
(Show Context)
Citation Context ...t; cycles of reference links are not possible (Bangs and Wuillemin, 2000a). Finally, inference can be performed by translating the OOBN into a multiply-sectioned Bayesian network (Xiang et al., 1993; =-=Xiang and Jensen, 1999-=-), see Bangs and Wuillemin (2000a) for details on this translation. Alternatively, we can construct the underlying BN of the OOBN: The underlying 3. To avoid confusion with the normal links in the mod... |

73 |
On use of the EM algorithm for penalized likelihood estimation
- Green
- 1990
(Show Context)
Citation Context ...g the integral in Equation 1. Thus, in order to apply these approximations we need to find the MAP parameters, for example by using the expectation-maximization (EM) algorithm (Dempster et al., 1977; =-=Green, 1990-=-), before we can calculate the score of a model. Thus, for each candidate model we may need to invest a considerable amount of time in order to evaluate the model. As an alternative, Friedman (1998) d... |

71 |
A Bayesian method for constructing Bayesian belief networks from databases
- Cooper, Herskovits
- 1990
(Show Context)
Citation Context ...all influence upon the selected model, structural priors are most often used to encode ignorance, and in some cases to restrict model complexity. Examples include the uniform prior r(X i , P i ) = 1 (=-=Cooper and Herskovits, 1991-=-), and r(X i , P i ) = # n-1 | P i | # -1 351 LANGSETH AND NIELSEN used by Friedman and Koller (2003). Another prior which is frequently used is r(X i , P i ) = k d i (Heckerman et al., 1995), where 0... |

61 | A Bayesian approach to learning causal networks - Heckerman - 1995 |

61 | Probabilistic Reasoning for Complex Systems - Pfeffer - 2000 |

50 | On the sample complexity of learning Bayesian networks - Friedman, Yakhini - 1996 |

48 | Learning Bayesian Nets that perform well
- Greiner, Grove, et al.
- 1996
(Show Context)
Citation Context ...te that this approach can be seen as a generalization of the method for parameter learning in DBNs, see West and Harrison (1997). 17sLangseth and Nielsen according to a predefined query distribution (=-=Greiner et al., 1997-=-), the learning method would have been slightly different (the general approach, however, would still apply). The proposed method is tightly connected to the SEM-algorithm, described in Section 3.3. T... |

43 | On information and su"ciency - Kullback, Leibler - 1951 |

41 | Learning Probabilistic Networks - Krause - 1998 |

41 |
Stochastic Complexity (with discussion
- Rissanen
- 1987
(Show Context)
Citation Context ...approach is to apply asymptotic approximations such as the Laplace approximation, (see, for example, Ripley, 1996), the Bayesian Information Criterion (Schwarz, 1978), the Minimum Description Length (=-=Rissanen, 1987-=-) or the CheesemanStutz approximation (Cheeseman and Stutz, 1996), see also Chichering and Heckerman (1997) for a discussion. These approximations assume that the posterior over the parameters is peak... |

41 | Counting labeled acyclic digraphs - Robinson - 1973 |

40 | Discovering hidden variables: A structurebased approach
- Elidan, Lotner, et al.
- 2000
(Show Context)
Citation Context ... by increasing 7. An active research area within the learning community is the discovery of hidden variables. These types of variables are never observed (Spirtes et al., 1993; Friedman et al., 1998; =-=Elidan et al., 2000-=-; Elidan and Friedman, 2001), however, hidden variables will not be considered further in this paper. 349 LANGSETH AND NIELSEN the expected score at each iteration we always obtain a better network in... |

35 | Network Engineering for Complex Belief Networks - Mahoney, Laskey - 1996 |

30 | E cient approximations for the marginal likelihood of Bayesian networks with hidden variables - Chickering, Heckerman - 1997 |

28 | Bayesian Network Classi - Friedman, Geiger, et al. - 1997 |

27 | Baysian classi (autoclass): Theory and results - Cheeseman, Stutz - 1996 |

25 | Learning the dimensionality of hidden variables
- Elidan, Friedman
- 2001
(Show Context)
Citation Context ...active research area within the learning community is the discovery of hidden variables. These types of variables are never observed (Spirtes et al., 1993; Friedman et al., 1998; Elidan et al., 2000; =-=Elidan and Friedman, 2001-=-), however, hidden variables will not be considered further in this paper. 349 LANGSETH AND NIELSEN the expected score at each iteration we always obtain a better network in terms of its marginal scor... |

24 | Top-down construction and repetitive structures representation in Bayesian networks - Bangsø, Wuillemin - 2000 |

24 | A computational scheme for reasoning in dynamic probabilistic networks - Kjærulff - 1992 |

23 | The sample complexity of learning fixed-structure bayesian networks - Dasgupta - 1997 |