Results 1  10
of
21
Toward optimal feature selection
 In 13th International Conference on Machine Learning
, 1995
"... In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it g ..."
Abstract

Cited by 361 (10 self)
 Add to MetaCart
In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for de ning the theoretically optimal, but computationally intractable, method for feature subset selection is presented. We show that our goal should be to eliminate a feature if it gives us little or no additional information beyond that subsumed by the remaining features. In particular, this will be the case for both irrelevant and redundant features. We then give an e cient algorithm for feature selection which computes an approximation to the optimal feature selection criterion. The conditions under which the approximate algorithm is successful are examined. Empirical results are given on a number of data sets, showing that the algorithm e ectively handles datasets with a very large number of features.
Markov properties for acyclic directed mixed graphs
 Scandinavian Journal of Statistics
, 2003
"... We consider acyclic directed mixed graphs, in which directed edges (x → y) and bidirected edges (x ↔ y) may occur. A simple extension of Pearl’s dseparation criterion, called mseparation, is applied to these graphs. We introduce a local Markov property which is equivalent to the global property r ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
We consider acyclic directed mixed graphs, in which directed edges (x → y) and bidirected edges (x ↔ y) may occur. A simple extension of Pearl’s dseparation criterion, called mseparation, is applied to these graphs. We introduce a local Markov property which is equivalent to the global property resulting from the mseparation criterion.
Constructional belief and rational representation
 Computational Intelligence
, 1989
"... It is commonplace in artificial intelligence to divide an agent’s explicit beliefs into two parts: the beliefs explicitly represented or manifest in memory, and the implicitly represented or constructive beliefs that are repeatedly reconstructed when needed rather than memorized. Many theories of kn ..."
Abstract

Cited by 27 (10 self)
 Add to MetaCart
It is commonplace in artificial intelligence to divide an agent’s explicit beliefs into two parts: the beliefs explicitly represented or manifest in memory, and the implicitly represented or constructive beliefs that are repeatedly reconstructed when needed rather than memorized. Many theories of knowledge view the relation between manifest and constructive beliefs as a logical relation, with the manifest beliefs representing the constructive beliefs through a logic of belief. This view, however, limits the ability of a theory to treat incomplete or inconsistent sets of beliefs in useful ways. We argue that a more illuminating view is that belief is the result of rational representation. In this theory, the agent obtains its constructive beliefs by using its manifest beliefs and preferences to rationally (in the sense of decision theory) choose the most useful conclusions indicated by the manifest beliefs. 1
Stochastic reasoning, free energy, and information geometry
 Neural Computation
, 2004
"... Belief propagation (BP) is a universal method of stochastic reasoning. It gives exact inference for stochastic models with tree interactions, and works surprisingly well even if the models have loopy interactions. Its performance has been analyzed separately in many fields, such as, AI, statistical ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Belief propagation (BP) is a universal method of stochastic reasoning. It gives exact inference for stochastic models with tree interactions, and works surprisingly well even if the models have loopy interactions. Its performance has been analyzed separately in many fields, such as, AI, statistical physics, information theory, and information geometry. The present paper gives a unified framework to understand BP and related methods, and to summarize the results obtained in many fields. In particular, BP and its variants including tree reparameterization (TRP) and concaveconvex procedure (CCCP) are reformulated with information geometrical terms, and their relations to the free energy function are elucidated from information geometrical viewpoint. We then propose a family of new algorithms. The stabilities of the algorithms are analyzed, and methods to accelerate them are investigated. 1
An introduction and survey of estimation of distribution algorithms
 SWARM AND EVOLUTIONARY COMPUTATION
, 2011
"... ..."
I don’t want to think about it now: Decision theory with costly computation
 In KR’10
, 2010
"... Computation plays a major role in decision making. Even if an agent is willing to ascribe a probability to all states and a utility to all outcomes, and maximize expected utility, doing so might present serious computational problems. Moreover, computing the outcome of a given act might be difficult ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Computation plays a major role in decision making. Even if an agent is willing to ascribe a probability to all states and a utility to all outcomes, and maximize expected utility, doing so might present serious computational problems. Moreover, computing the outcome of a given act might be difficult. In a companion paper we develop a framework for game theory with costly computation, where the objects of choice are Turing machines. Here we apply that framework to decision theory. We show how wellknown phenomena like firstimpressionmatters biases (i.e., people tend to put more weight on evidence they hear early on), belief polarization (two people with different prior beliefs, hearing the same evidence, can end up with diametrically opposed conclusions), and the status quo bias (people are much more likely to stick with what they already have) can be easily captured in that framework. Finally, we use the framework to define some new notions: value of computational information (a computational variant of value of information) and computational value of conversation. 1
The ABC's of Online Community
 in Web Intelligence: Research and Development, SpringerVerlag, LNAI 2198, 2001
, 2001
"... This article addresses these questions by articulating an evidential conceptual model of community synthesizing earlier definitions drawn from the literature and adding new conditions. It illustrates, by means of a small case study, how the level of community can be gauged based on evidence of co ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This article addresses these questions by articulating an evidential conceptual model of community synthesizing earlier definitions drawn from the literature and adding new conditions. It illustrates, by means of a small case study, how the level of community can be gauged based on evidence of core community conditions. The four conditions, purpose, commitment, context and infrastructure, we believe are necessary and sufficient for modeling and gauging intracommunity "glue", and that without this glue sustainable community cannot manifest. 1 Introduction Many definitions of online community (OC) or virtual or e or network community have been described. Broadly, publications can be divided into three areas. The first is sociological research and is well represented in the work of Barry Wellman [1,2]. Wellman contends that social network analysis, which examines community in terms of the social network of participants rather than in terms of space (neighborhoods),
A Neural Network Model for Monotonic Diagnostic Problem Solving
 in Proceedings of the 2nd IEEE International Conference on Intelligent Processing Systems, Cold
, 1998
"... The task of diagnosis is to find a hypothesis that best explains a set of manifestations (observations). Generally, it is computationally expensive to find a hypothesis because the number of the potential hypotheses is exponentially large. Recently, many efforts have been made to find parallel proce ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
The task of diagnosis is to find a hypothesis that best explains a set of manifestations (observations). Generally, it is computationally expensive to find a hypothesis because the number of the potential hypotheses is exponentially large. Recently, many efforts have been made to find parallel processing methods to solve the above difficulty. In this paper, we propose a neural network model for diagnostic problem solving where a diagnostic problem is treated as a combinatorial optimisation problem. One feature of the model is that the causal network is directly used as the network. Another feature is that the errors between the observations and the current activations of manifestation nodes are used to guide the network computing for finding optimal diagnostic hypotheses. 1 Introduction For a set of manifestations(observations), the diagnostic inference is to find the most plausible faults or disorders which can explain why the manifestations are present. In general, an individual d...
Information Fusion, Causal Probabilistic Network And Probanet II: Inference Algorithms and Probanet System
 Proc. 1st Intl. Workshop on Image Analysis and Information Fusion
, 1997
"... As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal Probabilistic Networks (CPN), this paper formalizes kernel algorithms for probabilistic inferences upon CPNs. Information fusion is realized through updating joint probabilities of the variables upon the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
As an extension of an overview paper [Pan and McMichael, 1997] on information fusion and Causal Probabilistic Networks (CPN), this paper formalizes kernel algorithms for probabilistic inferences upon CPNs. Information fusion is realized through updating joint probabilities of the variables upon the arrival of new evidences or new hypotheses. Kernel algorithms for some dominant methods of inferences are formalized from discontiguous, mathematicsoriented literatures, with gaps lled in with regards to computability and completeness. In particular, possible optimizations on causal tree algorithm, graph triangulation and junction tree algorithm are discussed. Probanet has been designed and developed as a generic shell, or say, mother system for CPN construction and application. The design aspects and current status of Probanet are described. A few directions for research and system development are pointed out, including hierarchical structuring of network, structure decomposition and adaptive inference algorithms. This paper thus has a nature of integration including literature review, algorithm formalization and future perspective.
Adapting Bayes Network Structures to Nonstationary Domains
"... When an incremental structural learning method gradually modifies a Bayesian network (BN) structure to fit observations, as they are read from a database, we call the process structural adaptation. Structural adaptation is useful when the learner is set to work in an unknown environment, where a BN ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
When an incremental structural learning method gradually modifies a Bayesian network (BN) structure to fit observations, as they are read from a database, we call the process structural adaptation. Structural adaptation is useful when the learner is set to work in an unknown environment, where a BN is to be gradually constructed as observations of the environment are made. Existing algorithms for incremental learning assume that the samples in the database have been drawn from a single underlying distribution. In this paper we relax this assumption, so that the underlying distribution can change during the sampling of the database. The method that we present can thus be used in unknown environments, where it is not even known whether the dynamics of the environment are stable. We briefly state formal correctness results for our method, and demonstrate its feasibility experimentally. 1