## An efficient data mining method for learning Bayesian networks using an evolutionary algorithm-based hybrid approach (2004)

Citations: | 5 - 0 self |

### BibTeX

@TECHREPORT{Wong04anefficient,

author = {Man Leung Wong and Kwong Sak Leung and Senior Member},

title = {An efficient data mining method for learning Bayesian networks using an evolutionary algorithm-based hybrid approach},

institution = {},

year = {2004}

}

### OpenURL

### Abstract

Abstract—Given the explosive growth of data collected from current business environment, data mining can potentially discover new knowledge to improve managerial decision making. This paper proposes a novel data mining approach that employs an evolutionary algorithm to discover knowledge represented in Bayesian networks. The approach is applied successfully to handle the business problem of finding response models from direct marketing data. Learning Bayesian networks from data is a difficult problem. There are two different approaches to the network learning problem. The first one uses dependency analysis, while the second one searches good network structures according to a metric. Unfortunately, both approaches have their own drawbacks. Thus, we propose a novel hybrid algorithm of the two approaches, which consists of two phases, namely, the conditional independence (CI) test and the search phases. In the CI test phase, dependency analysis is conducted to reduce the size of the search space. In the search phase, good Bayesian network models are generated by using an evolutionary algorithm. A new operator is introduced to further enhance the search effectiveness and efficiency. In a number of experiments and comparisons, the hybrid algorithm outperforms MDLEP, our previous algorithm which uses evolutionary programming (EP) for network learning, and other network learning algorithms. We then apply the approach to two data sets of direct marketing and compare the performance of the evolved Bayesian networks obtained by the new algorithm with those by MDLEP, the logistic regression models, the naïve Bayesian classifiers, and the tree-augmented naïve Bayesian network classifiers (TAN). In the comparison, the new algorithm outperforms the others. Index Terms—Bayesian networks, data mining, evolutionary computation, evolutionary programming (EP). I.

### Citations

7342 |
J.H.: Genetic Algorithms and
- Goldberg, Holland
- 1988
(Show Context)
Citation Context ...re two approaches [20], [39] that apply evolutionary computation to tackle the problem of learning Bayesian networks using the search-and-scoring approach. The first one uses genetic algorithms (GAs) =-=[44]-=-, [45], while the second one uses evolutionary programming (EP) [46], [47]. A. Learning Bayesian Network Using Genetic Algorithms (GAs) Larrañaga et al. [39] proposed to use GAs [44], [45] to search f... |

7052 |
Probabilistic Reasoning in Intelligent Systems
- Pearl
- 1988
(Show Context)
Citation Context ... in artificial intelligence. A Bayesian network is a graphical representation that depicts conditional independence among random variables in the domain and encodes the joint probability distribution =-=[7]-=-. With a network at hand, probabilistic inference can be performed to predict the outcome of some variables based on the observations of others. In light of this, Bayesian networks are widely used in ... |

1203 |
Categorical Data Analysis
- Agresti
- 1990
(Show Context)
Citation Context ...therwise, is not d-separated with by [28], [7]. The validity of an independence assertion is tested by performing a CI test. Statistical hypothesis testing procedure could be used in the CI test [1], =-=[29]-=-, [30]. To begin with, the conditional independence assertion [i.e., ] is modeled as the null hypothesis. Suppose that we use the likelihood-ratio test and the statistics is calculated by observed obs... |

1160 |
Modeling by shortest data description
- Rissanen
- 1978
(Show Context)
Citation Context ...distribution [(2)], we could derive a measure for assessing the goodness of such encoding. For instance, the measure could be derived from Bayesian statistics, information theory or the MDL principle =-=[34]-=-. Though their theoretical foundations are different, some studies [35], [36] show that different metrics are asymptotically equivalent under certain conditions. Since we employ the MDL metric [19] in... |

1075 | Herskovitz: A Bayesian Method for the Induction - Cooper, E - 1992 |

905 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...n networks are widely used in diagnostic and classification systems. For example, MUNIN is used for diagnosing diseases in muscles and nerve, and PATHFINDER is used for diagnosing lymph node diseases =-=[8]-=-. Besides, they are also used in information retrieval [9] and printer troubleshooting [10]. Typically, a Bayesian network is constructed by eliciting knowledge from domain experts. To reduce imprecis... |

644 |
Artificial Intelligence through Simulated Evolution
- Fogel, Owens, et al.
- 1966
(Show Context)
Citation Context ... the problem of learning Bayesian networks using the search-and-scoring approach. The first one uses genetic algorithms (GAs) [44], [45], while the second one uses evolutionary programming (EP) [46], =-=[47]-=-. A. Learning Bayesian Network Using Genetic Algorithms (GAs) Larrañaga et al. [39] proposed to use GAs [44], [45] to search for the optimal Bayesian network structure. In their research, the network ... |

612 |
Evolutionary Computation: Toward a New Philosophy of Machine Intelligence
- Fogel
- 2005
(Show Context)
Citation Context ...tackle the problem of learning Bayesian networks using the search-and-scoring approach. The first one uses genetic algorithms (GAs) [44], [45], while the second one uses evolutionary programming (EP) =-=[46]-=-, [47]. A. Learning Bayesian Network Using Genetic Algorithms (GAs) Larrañaga et al. [39] proposed to use GAs [44], [45] to search for the optimal Bayesian network structure. In their research, the ne... |

589 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...he approach to two data sets of direct marketing and compare the performance of the evolved Bayesian networks obtained by HEA and MDLEP, the logistic regression models, the naïve Bayesian classifiers =-=[25]-=-, [26], and the tree-augmented naïve Bayesian network classifiers (TAN) [25]. We then conclude the paper in Section VII. II. BACKGROUNDS A. Bayesian Networks A Bayesian network, , has a directed acycl... |

298 | Learning Bayesian Networks: The
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ... used for diagnosing diseases in muscles and nerve, and PATHFINDER is used for diagnosing lymph node diseases [8]. Besides, they are also used in information retrieval [9] and printer troubleshooting =-=[10]-=-. Typically, a Bayesian network is constructed by eliciting knowledge from domain experts. To reduce imprecision due to subjective judgments, researchers start to be interested in 1089-778X/04$20.00 ©... |

211 | Induction of selective Bayesian classifiers
- Langley, Sage
- 1994
(Show Context)
Citation Context ...roach to two data sets of direct marketing and compare the performance of the evolved Bayesian networks obtained by HEA and MDLEP, the logistic regression models, the naïve Bayesian classifiers [25], =-=[26]-=-, and the tree-augmented naïve Bayesian network classifiers (TAN) [25]. We then conclude the paper in Section VII. II. BACKGROUNDS A. Bayesian Networks A Bayesian network, , has a directed acyclic gra... |

155 | A new evolutionary system for evolving artificial neural networks
- Yao, Liu
- 1997
(Show Context)
Citation Context ...sover cannot achieve the purpose inWONG AND LEUNG: EFFICIENT DATA MINING METHOD FOR LEARNING BAYESIAN NETWORKS 383 this problem. Similarly, this observation was independently reported by Yao and Liu =-=[55]-=-. They proposed a novel evolutionary system, called EPNet, for evolving architectures and weights of artificial neural networks(ANNs) simultaneously. EPNet emphasizes on evolving ANN behaviors and use... |

150 |
Probabilistic Reasoning in Expert Systems
- Neapolitan
- 1990
(Show Context)
Citation Context ... a network by testing the validity of any independence assertions . If the statement is supported by the data, it follows that should be d-separated with by in ; otherwise, is not d-separated with by =-=[28]-=-, [7]. The validity of an independence assertion is tested by performing a CI test. Statistical hypothesis testing procedure could be used in the CI test [1], [29], [30]. To begin with, the conditiona... |

129 | Learning equivalence classes of Bayesian-network structure
- Chickering
- 2002
(Show Context)
Citation Context ...flexive, symmetric, and transitive, it defines a set of equivalence classes over network structures. The MDL scoring criterion is score equivalent that assigns the same score to equivalent structures =-=[58]-=-. Search strategies will be inefficient if they spend most of their time within the same equivalence class. The merge operator can avoid this problem because it generates a network structure having di... |

93 | A comparison of linear genetic programming and neural networks in medical data mining
- Brameier, Banzhaf
- 2001
(Show Context)
Citation Context ...often gives satisfactory results for various optimization problems in different areas. For example, it is applied in data mining, image processing, pattern recognition, and signal processing [3]–[6], =-=[40]-=-–[43]. Recently, there are two approaches [20], [39] that apply evolutionary computation to tackle the problem of learning Bayesian networks using the search-and-scoring approach. The first one uses g... |

92 | Learning bayesian networks from data: an information-theory based approach
- Tan, Cheng, et al.
(Show Context)
Citation Context ...s in the domain. Recently, there is also increasing interest in applying Bayesian networks for data mining [11]–[15]. In the literature, there are two main approaches to this network learning problem =-=[16]-=-. The first one is the dependency analysis approach [1], [16]. Since a Bayesian network describes conditional independence, we could make use of dependency test results to construct a Bayesian network... |

91 | A characterization of Markov equivalence classes for acyclic digraphs
- ANDERSSON, MADIGAN, et al.
- 1997
(Show Context)
Citation Context ... obtained in previous generations. Moreover, the merge operator ensures that the new network structure and its parental network structures belong to different equivalence classes of Bayesian networks =-=[57]-=-. Two network structures are equivalent if the set of distributions that can be represented using one of the structures is identical to the set of distributions that can be represented using the other... |

89 | Scalable techniques for mining causal structures - Silverstein, Brin, et al. - 1998 |

71 | Structure learning of bayesian networks by genetical algorithms
- LARRAGANA, POZA
- 1994
(Show Context)
Citation Context ...nd-bound [38], to find the optimal solution. In the worst case, the time consumed would be considerable. Recently, some researchers attempt to use evolutionary computation to tackle the problem [20], =-=[39]-=-. III. LEARNING BAYESIAN NETWORKS USING EVOLUTIONARY COMPUTATION Evolutionary computation is a general stochastic search methodology. The principal idea is borrowed from evolution mechanisms proposed ... |

57 | Evolutionary computation 1: basic algorithms and operators - äck, Fogel, et al. - 2000 |

54 | Learning Bayesian network structures by searching for the best ordering with genetic algorithms
- Larrañaga, Kuijpers, et al.
- 1996
(Show Context)
Citation Context ...ions under different parameter settings. Based on the results, several recommendations regarding the choice of implementation and parameters were made [48]. In another research work, Larrañaga et al. =-=[49]-=- considered the problem of finding node orderings. They represented a node ordering in a chromosome and used a genetic algorithm to evolve different node orderings. For each node ordering, it is passe... |

51 | Learning Bayesian belief networks based on the minimum description length principle: An efficient algorithm using the B&B technique
- Suzuki
- 1996
(Show Context)
Citation Context ...uch encoding. For instance, the measure could be derived from Bayesian statistics, information theory or the MDL principle [34]. Though their theoretical foundations are different, some studies [35], =-=[36]-=- show that different metrics are asymptotically equivalent under certain conditions. Since we employ the MDL metric [19] in our work, we take it as an example for illustration. Basically, the metric i... |

42 | Constructor: A system for the induction of probabilistic models
- Fung, Crawford
- 1990
(Show Context)
Citation Context ...hey are called the dependency analysis and the search-and-scoring approaches, respectively. 1) Dependency Analysis Approach: The dependency analysis approach includes the algorithms in [1], [16], and =-=[27]-=-. It takes the view that Bayesian networks depict conditional independence relations among the variables. Hence, the approach tries to construct a Bayesian network using dependency information obtaine... |

41 | Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study
- Cano, Herrera, et al.
- 2003
(Show Context)
Citation Context ...e expression programming (GEP) to learn classification rules. They evaluated their approach on several benchmark databases and demonstrated that accurate and compact rules can be induced. Cano et al. =-=[5]-=- compared four evolutionary and some nonevolutionary instance selection algorithms. The experimental results suggested that the evolutionary instance selection algorithms outperform the nonevolutionar... |

40 | A novel evolutionary data mining algorithm with applications to churn prediction
- Au, Chan, et al.
- 2003
(Show Context)
Citation Context ...nd apply the approach to handle the business problem of finding response models from direct marketing data. Recently, some researchers have employed evolutionary algorithms for data mining. Au et al. =-=[3]-=- proposed an algorithm, called data mining by evolutionary learning (DMEL), that induces classification rules for predicting the likelihood of each classification made. They performed several experime... |

33 | Inferring Informational Goals from Free-Text Queries: A Bayesian Approach
- Heckerman, Horvitz
- 1998
(Show Context)
Citation Context ...n systems. For example, MUNIN is used for diagnosing diseases in muscles and nerve, and PATHFINDER is used for diagnosing lymph node diseases [8]. Besides, they are also used in information retrieval =-=[9]-=- and printer troubleshooting [10]. Typically, a Bayesian network is constructed by eliciting knowledge from domain experts. To reduce imprecision due to subjective judgments, researchers start to be i... |

29 | A hybrid anytime algorithm for the construction of causal models from sparse data
- Dash, Druzdzel
- 1999
(Show Context)
Citation Context ...inaccurate. Third, because a network is constructed in a step by step manner, the construction algorithm may be unstable in the sense that an earlier mistake during construction is consequential [1], =-=[33]-=-. Moreover, this implies that the order of testing the CI relations is important, which will be a concern when one pursues for the optimal performance. 2) Search-and-Scoring Approach: The second appro... |

28 | A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships
- Cooper
- 1997
(Show Context)
Citation Context ...BAYESIAN NETWORKS 379 constructing a Bayesian network from collected data or past observations in the domain. Recently, there is also increasing interest in applying Bayesian networks for data mining =-=[11]-=-–[15]. In the literature, there are two main approaches to this network learning problem [16]. The first one is the dependency analysis approach [1], [16]. Since a Bayesian network describes condition... |

22 | Learning Bayesian Networks from Incomplete Data Using Evolutionary Algorithms
- Myers, Laskey, et al.
- 1999
(Show Context)
Citation Context ...rs et al. [51] proposed a GA that learns Bayesian networks from incomplete data. This algorithm evolves the Bayesian network structures and the values of the missing data simultaneously. Myers et al. =-=[52]-=- introduced the evolutionary Markov chain Monte Carlo (EMCMC) algorithm to learn Bayesian networks from incomplete data. EMCMC combines the advantages of the canonical genetic algorithm and the Markov... |

20 | Bayesian networks for data mining, Data mining and knowledge discovery - Heckerman - 1997 |

17 |
Using Evolutionary Programming and Minimum Description Length Principle for Data Mining of Bayesian Networks
- Wong, Lam, et al.
(Show Context)
Citation Context ...ess. With such reduction, the search process would take less time for finding the optimal solution. Together with the introduction of a new operator and some modifications of our previous work, MDLEP =-=[20]-=-, we call our new approach hybrid evolutionary algorithm (HEA). We have conducted a number of experiments and compared HEA with MDLEP and other network learning algorithms. The empirical results illus... |

15 | Evolving accurate and compact classification rules with gene expression programming
- Zhou, Xiao, et al.
(Show Context)
Citation Context ...several experiments to show that DMEL can discover interesting rules effectively. Moreover, they applied DMEL to a large database with 100 000 records to learn rules for churn prediction. Zhou et al. =-=[4]-=- employed gene expression programming (GEP) to learn classification rules. They evaluated their approach on several benchmark databases and demonstrated that accurate and compact rules can be induced.... |

12 | Population Markov chain Monte Carlo
- Myers, Laskey
(Show Context)
Citation Context ... Carlo (EMCMC) algorithm to learn Bayesian networks from incomplete data. EMCMC combines the advantages of the canonical genetic algorithm and the Markov chain Monte Carlo algorithm. Laskey and Myers =-=[53]-=- applied a hybrid algorithm called population Markov chain Monte Carlo (popMCMC) to induce Bayesian networks from data sets with missing observations and hidden variables. PopMCMC increases the rate o... |

11 | Towards a more efficient evolutionary induction of bayesian networks
- Cotta, Muruzabal
(Show Context)
Citation Context ...btain a network. Cotta and Muruzábal proposed a number of recombination operators for Bayesian networks and applied these operators in some steady-state genetic algorithms to induce Bayesian networks =-=[50]-=-. Myers et al. [51] proposed a GA that learns Bayesian networks from incomplete data. This algorithm evolves the Bayesian network structures and the values of the missing data simultaneously. Myers et... |

10 |
J.F.: Approximating causal orderings for Bayesian networks using genetic algorithms and simulated annealing
- Campos, Huete
(Show Context)
Citation Context ...et need to be examined which would require an exponential number of tests. Second, results from CI test may not be reliable for high-order CI tests when the size of the conditioning set is large [1], =-=[32]-=-. Hence, for algorithms that require high-order CI where is a constant denoting the number of bits used to store a numerical value. Intuitively, the network description length represents the structura... |

10 | Ogden-Swift: Evolutionary Learning of Dynamic Probabilistic Models with Large Time Lags
- Tucker, Liu, et al.
- 2001
(Show Context)
Citation Context ... (in comparison with the original network), and smaller MDL scores. In addition, MDLEP is also faster as it requires fewer generations to converge and generates less invalid structures. Tucker et al. =-=[54]-=- extended MDLEP to find good dynamic Bayesian network structures that can have large time lags. C. Problems of the Previous Approaches As reported in Wong et al.’s work [20], the EP formulation seems ... |

9 | Knowledgeintensive genetic discovery in foreign exchange markets - Bhattacharyya, Pictet, et al. - 2002 |

8 |
D.: Learning Bayesian networks is NP-hard. Microsoft Research Technical Report
- Chickering, Geiger, et al.
- 1994
(Show Context)
Citation Context ... the problem is difficult as the search space, which contains all possible network structures, is huge. Chickering et al. proved that the search problem is NP-hard with the use of a particular metric =-=[37]-=-. Some algorithms, therefore, resort to greedy search heuristics [17], [19]. However, the drawback of these algorithms is that suboptimal solutions may be obtained. Some others use systematic and exha... |

7 |
A branch-and-bound algorithm for MDL learning Bayesian networks
- Tian
(Show Context)
Citation Context ...to greedy search heuristics [17], [19]. However, the drawback of these algorithms is that suboptimal solutions may be obtained. Some others use systematic and exhaustive search, like branch-and-bound =-=[38]-=-, to find the optimal solution. In the worst case, the time consumed would be considerable. Recently, some researchers attempt to use evolutionary computation to tackle the problem [20], [39]. III. LE... |

7 | der Gaag. Building a GA from design principles for learning Bayesian networks - Dijk, Thierens, et al. - 2003 |

6 |
Bayesian Belief Networks: from Inference to Construction
- Bouckaert
- 1995
(Show Context)
Citation Context ...s of such encoding. For instance, the measure could be derived from Bayesian statistics, information theory or the MDL principle [34]. Though their theoretical foundations are different, some studies =-=[35]-=-, [36] show that different metrics are asymptotically equivalent under certain conditions. Since we employ the MDL metric [19] in our work, we take it as an example for illustration. Basically, the me... |

5 | Discovering knowledge from medical databases using evolutionary algorithms - Wong, Lam, et al. - 2000 |

5 |
Learning Bayesian belief networks-an approach based on the MDL principle
- Lam, Bacchus
- 1994
(Show Context)
Citation Context ...scribes conditional independence, we could make use of dependency test results to construct a Bayesian network that conforms to our findings. The second one, called the score-and-search approach [17]–=-=[19]-=-, uses a metric to evaluate a candidate network structure. With the metric, a search algorithm is employed to find a network structure which has the best score. Thus, the learning problem becomes a se... |

5 |
Evolutionary Computation 2: Advanced Algorithms and Operations
- Back, Fogel, et al.
- 2000
(Show Context)
Citation Context ... gives satisfactory results for various optimization problems in different areas. For example, it is applied in data mining, image processing, pattern recognition, and signal processing [3]–[6], [40]–=-=[43]-=-. Recently, there are two approaches [20], [39] that apply evolutionary computation to tackle the problem of learning Bayesian networks using the search-and-scoring approach. The first one uses geneti... |

4 | A semantically guided and domainindependent evolutionary model for knowledge discovery from texts
- Atkinson-Abutridy, Mellish, et al.
(Show Context)
Citation Context ...ome nonevolutionary instance selection algorithms. The experimental results suggested that the evolutionary instance selection algorithms outperform the nonevolutionary ones. Atkinson–Abutridy et al. =-=[6]-=- proposed a novel approach for knowledge discovery from texts. The approach uses natural language techniques and genetic algorithms to generate novel explanatory hypotheses. Bayesian networks are popu... |

4 | Parallel learning of belief networks in large and dicult domains. Data Mining and Knowledge Discovery
- Xiang, Chu
- 1999
(Show Context)
Citation Context ...IAN NETWORKS 379 constructing a Bayesian network from collected data or past observations in the domain. Recently, there is also increasing interest in applying Bayesian networks for data mining [11]–=-=[15]-=-. In the literature, there are two main approaches to this network learning problem [16]. The first one is the dependency analysis approach [1], [16]. Since a Bayesian network describes conditional in... |

2 | Improving the efficiency of using evolutionary programming for Bayesian network learning - Lee, Leung, et al. |

2 | A hybrid approach to discover Bayesian networks from databases using evolutionary programming - Wong, Lee, et al. |

2 |
Statistical Tests: An introduction with MINITAB commentary
- Beaumont, Knowles
- 1996
(Show Context)
Citation Context ...er of possible instantiations of the variables , , and are, respectively, , , and , follows a distribution with degree-of-freedom. Checking our computed against the distribution, we obtain the -value =-=[31]-=-. If the -value is less than a predefined cutoff value , the test shows strong evidence to reject the hypothesis; otherwise, the hypothesis cannot be rejected. For example, the SGS algorithm [1] begin... |

1 | hybrid approach to learn Bayesian networks using evolutionary programming - “A |