Results 1  10
of
15
Scalable Techniques for Mining Causal Structures
 Data Mining and Knowledge Discovery
, 1998
"... Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a st ..."
Abstract

Cited by 88 (1 self)
 Add to MetaCart
Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a statistical relationship between A and B. They do not specify the nature of the relationship: whether the presence of A causes the presence of B, or the converse, or some other attribute or phenomenon causes both to appear together. In applications, knowing such causal relationships is extremely useful for enhancing understanding and effecting change. While distinguishing causality from correlation is a truly difficult problem, recent work in statistics and Bayesian learning provide some avenues of attack. In these fields, the goal has generally been to learn complete causal models, which are essentially impossible to learn in largescale data mining applications with a large number of variab...
Learning Classifiers from Semantically Heterogeneous Data
 In: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE 2004), Agia
, 2004
"... Abstract. Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each data source has an associated ontology. Different users or applications need to be able to query such data source ..."
Abstract

Cited by 21 (16 self)
 Add to MetaCart
Abstract. Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each data source has an associated ontology. Different users or applications need to be able to query such data sources for statistics of interest (e.g., statistics needed to learn a predictive model from data). Because no single ontology meets the needs of all applications or users in every context, or for that matter, even a single user in different contexts, there is a need for principled approaches to acquiring statistics from semantically heterogeneous data. In this paper, we introduce ontologyextended data sources and define a user perspective consisting of an ontology and a set of interoperation constraints between data source ontologies and the user ontology. We show how these constraints can be used to derive mappings from source ontologies to the user ontology. We observe that most of the learning algorithms use only certain statistics computed from data in the process of generating the hypothesis that they output. We show how the ontology mappings can be used to answer statistical queries needed by algorithms for learning classifiers from data viewed from a certain user perspective. The resulting algorithms offer a powerful approach to datadriven knowledge acquisition over the Semantic Web.
Bayesian Nets Are All There Is To Causal Dependence
 STOCHASTIC DEPENDENCE AND CAUSALITY, CSLI
, 2001
"... ..."
Learning classifiers from distributed, semantically heterogeneous, autonomous data sources
, 2004
"... Recent advances in computing, communications, and digital storage technologies, together with development of high throughput data acquisition technologies have made it possible to gather and store large volumes of data in digital form. These developments have resulted in unprecedented opportunities ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Recent advances in computing, communications, and digital storage technologies, together with development of high throughput data acquisition technologies have made it possible to gather and store large volumes of data in digital form. These developments have resulted in unprecedented opportunities for largescale datadriven knowledge acquisition with the potential for fundamental gains in scientific understanding (e.g., characterization of macromolecular structurefunction relationships in biology) in many datarich domains. In such applications,
the data sources of interest are typically physically distributed, semantically heterogeneous and autonomously owned and operated, which makes it impossible to use traditional machine learning algorithms for knowledge acquisition.
However, we observe that most of the learning algorithms use only certain statistics computed from data in the process of generating the hypothesis that they output and we use this observation to design a general strategy for transforming traditional algorithms for learning from data into algorithms for learning from distributed data. The resulting algorithms are provably exact in that the classifiers produced by them are identical to those obtained by the corresponding algorithms in the centralized setting (i.e., when all of the data is available in a central location) and they compare favorably to their centralized counterparts in terms of time and communication complexity.
To deal with the semantical heterogeneity problem, we introduce ontologyextended data sources and define a user perspective consisting of an ontology and a set of interoperation constraints between data source ontologies and the user ontology. We show how these constraints can be used to define mappings and conversion functions needed to answer statistical queries from semantically heterogeneous data viewed from a certain user perspective. That is further used to extend our approach for learning from distributed data into a theoretically sound approach to learning from semantically heterogeneous data.
The work described above contributed to the design and implementation of AirlDM, a collection of data source independent machine learning algorithms through the means of sufficient statistics and data source wrappers, and to the design of INDUS, a federated, querycentric system for knowledge acquisition from distributed, semantically heterogeneous, autonomous data sources.
Analysis of HIV1 pol sequences using bayesian networks: implications for drug resistance
 Bioinformatics
, 2006
"... implications for drug resistance ..."
Visualization of Bayesian Belief Networks
 IEEE Visualization 1999 Late Breaking Hot Topics Proceedings
, 1999
"... Concepts like marginal probability, changes in probability, probability propagation and causeeffect relationships are important when reasoning about causality and uncertainty. ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Concepts like marginal probability, changes in probability, probability propagation and causeeffect relationships are important when reasoning about causality and uncertainty.
Visualization of Bayesian Learner Models
 In Proceedings of the workshop ‘Open, Interactive, and
, 1999
"... Abstract. Bayesian Belief Networks (BBNs) have been suggested as a suitable representation and inference mechanism for learner models (Reye, 1998). Having the goal to construct inspectable Bayesian learner models, we have encountered two problems. The first is the problem of visualization of BBNs an ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract. Bayesian Belief Networks (BBNs) have been suggested as a suitable representation and inference mechanism for learner models (Reye, 1998). Having the goal to construct inspectable Bayesian learner models, we have encountered two problems. The first is the problem of visualization of BBNs and the second is the problem of using BBNs to model changes in knowledge state over time due to gradual learning and forgetting. Both of these problems are being addressed in current research at the ARIES Laboratory and both are briefly discussed in this paper. 1
ShortTerm Load Forecasting in AirConditioned NonResidential Buildings
 In Proceedings of the 20th IEEE International Symposium on Industrial Electronics (ISIE
, 2011
"... Abstract—Shortterm load forecasting (STLF) has become an essential tool in the electricity sector. It has been classically object of vast research since energy load prediction is known to be nonlinear. In a previous work, we focused on nonresidential building STLF, an special case of STLF where we ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Abstract—Shortterm load forecasting (STLF) has become an essential tool in the electricity sector. It has been classically object of vast research since energy load prediction is known to be nonlinear. In a previous work, we focused on nonresidential building STLF, an special case of STLF where weather has negligible influence on the load. Now we tackle more modern buildings in which the temperature does alter its energy consumption. This is, we address here fullyHVAC (Heating, Ventilating, and Air Conditioning) ones. Still, in this problem domain, the forecasting method selected must be simple, without tedious trialanderror configuring or parametrising procedures, work with scarce (or any) training data and be able to predict an evolving demand curve. Following our preceding research, we have avoided the inherent nonlinearity by using the work day schedule as daytype classifier. We have evaluated the most popular STLF systems in the literature, namely ARIMA (autoregressive integrated moving average) time series and Neural networks (NN), together with an Autoregressive Model (AR) time series and a Bayesian network (BN), concluding that the autoregressive time series outperforms its counterparts and suffices to fulfil the addressed requirements, even in a 6 dayahead horizon. I.
Holistic Query Expansion Using Graphical Models
, 2004
"... this paper we present a method for answering relationship questions, as posed for example in the spring of 2003 evaluation exercise of the AQUAINT program, which has funded this research ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
this paper we present a method for answering relationship questions, as posed for example in the spring of 2003 evaluation exercise of the AQUAINT program, which has funded this research
Bayesian Network Development
"... Bayesian networks are a popular mechanism for dealing with uncertainty in complex situations. They are a fundamental probabilistic representation mechanism that subsumes a great variety of other stochastic modeling methods, such as hidden Markov models, stochastic dynamic systems. Bayesian networks, ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Bayesian networks are a popular mechanism for dealing with uncertainty in complex situations. They are a fundamental probabilistic representation mechanism that subsumes a great variety of other stochastic modeling methods, such as hidden Markov models, stochastic dynamic systems. Bayesian networks, in principle, make it possible to build large, complex stochastic models from standard components. Development methodologies for Bayesian networks have been introduced based on software engineering methodologies. However, this is complicated by the significant differences between the crisp, logical foundations of modern software and the fuzzy, empirical nature of stochastic modeling. Conversely, software engineering would benefit from better integration with Bayesian networks, so that uncertainty and stochastic inference can be introduced in a more systematic and formal manner than it is now. In this paper, Bayesian networks and stochastic inference are briefly introduced, and the development of Bayesian networks is compared with the development of objectoriented software. The challenges involved in Bayesian network development are then discussed.