Results 1 - 10
of
16
Learning Bayesian Networks from Data: An Information-Theory Based Approach
"... This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional indepe ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.
Learning Bayesian Networks from Data: An Efficient Approach Based on Information Theory
, 1997
"... This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our three-phase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our three-phase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the node ordering is given, the algorithm only require ) ( 2 N O CI tests and is correct given that the underlying model is DAG-Faithful [Spirtes et. al., 1996]. The other algorithm deals with the general case and requires ) ( 4 N O conditional independence (CI) tests. It is correct given that the underlying model is monotone DAG-Faithful (see Section 4.4). A system based on these algorithms has been developed and distributed through the Internet. The empirical results show that our approach is efficient and reliable. 1 Introduction The Bayesian network is a powerful knowledge representation and reasoning tool under conditions of uncertainty. A Bayesian network is a directed acyclic graph ...
Towards better support for spatial decision-making: defining the characteristics
- Geomatica
, 2001
"... To exploit the full potential of the spatial and temporal dimensions of a data warehouse, new tools are needed. It has been shown that OLAP possesses a certain potential to support spatio-temporal analysis. However, without a spatial interface for viewing and manipulating the geometric component of ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
To exploit the full potential of the spatial and temporal dimensions of a data warehouse, new tools are needed. It has been shown that OLAP possesses a certain potential to support spatio-temporal analysis. However, without a spatial interface for viewing and manipulating the geometric component of the spatial data, the analysis may be incomplete. A new category of OLAP tools, SOLAP (Spatial OLAP) tools, is pre-sented. SOLAP tools are defined and the associated concepts are presented. A series of essential features, as well as desirable characteristics, are then described. Finally, application prototypes are described and an example of spatio-temporal analysis is presented. Afin de tirer plein profit des dimensions spatiales et temporelles d'un entrepôt de données, de nouveaux outits sont necessaires. Il a été démonréi que OLAP possède un certain potentiel pour supporter l'analyse spatio-temporelle. Cependant, sans un volet spatial permettant de visualiser et de manipuler la composante géométrique des données spatiales, l'analyse peut demeurée incomplète. Une nouvelle catégorie d'outils OLAP, les outils SOLAP (Spatial OLAP), est introduite. Les outils SOLAP sont définis et les concepts associés sont présentés. Une série de caracteristiques essentielles, ainsi que des caractéristiques souhaitables, de cette catigorie d'outils sont ensuite énumérées. Finalement, des prototypes d'application SOLAP sont décrits et un exemple d'analyse spatio-temporelle est présenté. 1.
Parcel: Feature Subset Selection in Variable Cost Domains
, 1998
"... The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feat ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feature selection in the presence of varying costs. Starting from the Wilcoxon nonparametric statistic for the performance of a classification system, we introduce a concept called the maximum realisable receiver operating characteristic (MRROC), and prove a related theorem. A novel criterion for feature selection, based on the area under the MRROC curve, is then introduced. This leads to a framework which we call Parcel. This has the flexibility to use different combinations of features at different operating points on the resulting MRROC curve. Empirical support for each stage in our approach is provided by experiments on real world problems, with Parcel achieving superior results. iv v C...
DATA MINING FOR INTRUSION DETECTION -- A Critical Review
"... Data mining techniques have been successfully applied in many di#erent fields including marketing, manufacturing, process control, fraud detection, and network management. Over the past five years, a growing number of research projects have applied data mining to various problems in intrusion detect ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Data mining techniques have been successfully applied in many di#erent fields including marketing, manufacturing, process control, fraud detection, and network management. Over the past five years, a growing number of research projects have applied data mining to various problems in intrusion detection. This chapter surveys a representative cross section of these research e#orts. Moreover, four characteristics of contemporary research are identified and discussed in a critical manner. Conclusions are drawn and directions for future research are suggested. Note: This article is an excerpt of the original work published in D. Barbara and S. Jajodia, editors, Applications of Data Mining in Computer Security, Kluwer Academic Publisher, Boston, 2002.
Evolutionary Model Selection in Unsupervised Learning
, 2002
"... Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situati ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, K-means and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Pareto-optimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance. 1.
An Integrated Data Mining System to Automate Discovery of Measures of Association
- in 33rd Hawaii International Conference on System Sciences - Volume 2. 2000. Maui
"... Many data analysts require tools which can integrate their database management packages (e.g. Microsoft Access) with their data analysis ones (e.g. SAS, SPSS), and provide guidance for the selection of appropriate mining algorithms. In addition, the analysts need to extract and validate statistical ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many data analysts require tools which can integrate their database management packages (e.g. Microsoft Access) with their data analysis ones (e.g. SAS, SPSS), and provide guidance for the selection of appropriate mining algorithms. In addition, the analysts need to extract and validate statistical results to facilitate data mining. In this paper, we describe an integrated data mining system called the Linear Correlation Discovery System (LCDS) that meets the above requirement. LCDS consists of four major subcomponents, two of which, the selection assistant and the statistics coupler, are discussed in this paper. The former examines the schema and instances to determine appropriate association measurement functions (e.g. chi-square, linear regression, ANOVA). The latter invokes the appropriate statistical function on a sample data set, and extracts relevant statistical output such as η 2, and R 2 for effective mining of data. We also describe a new validation algorithm based on measuring the consistency of mining results applied to multiple test sets.
ADVANCES IN VARIABLE SELECTION AND VISUALIZATION METHODS FOR ANALYSIS OF MULTIVARIATE DATA
"... ISBN 978-951-22-8929-5 (printed version) ..."

