Results 1  10
of
127
Collective multilabel classification
 In CIKM
, 2005
"... Common approaches to multilabel classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only wellsuited to problems in which categories are independen ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
Common approaches to multilabel classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only wellsuited to problems in which categories are independent. However, in many domains labels are highly interdependent. This paper explores multilabel conditional random field (CRF) classification models that directly parameterize label cooccurrences in multilabel classification. Experiments show that the models outperform their singlelabel counterparts on standard text corpora. Even when multilabels are sparse, the models improve subset classification error by as much as 40%.
Graphcover decoding and finitelength analysis of messagepassing iterative decoding of LDPC codes
 IEEE TRANS. INFORM. THEORY
, 2005
"... The goal of the present paper is the derivation of a framework for the finitelength analysis of messagepassing iterative decoding of lowdensity paritycheck codes. To this end we introduce the concept of graphcover decoding. Whereas in maximumlikelihood decoding all codewords in a code are comp ..."
Abstract

Cited by 68 (12 self)
 Add to MetaCart
The goal of the present paper is the derivation of a framework for the finitelength analysis of messagepassing iterative decoding of lowdensity paritycheck codes. To this end we introduce the concept of graphcover decoding. Whereas in maximumlikelihood decoding all codewords in a code are competing to be the best explanation of the received vector, under graphcover decoding all codewords in all finite covers of a Tanner graph representation of the code are competing to be the best explanation. We are interested in graphcover decoding because it is a theoretical tool that can be used to show connections between linear programming decoding and messagepassing iterative decoding. Namely, on the one hand it turns out that graphcover decoding is essentially equivalent to linear programming decoding. On the other hand, because iterative, locally operating decoding algorithms like messagepassing iterative decoding cannot distinguish the underlying Tanner graph from any covering graph, graphcover decoding can serve as a model to explain the behavior of messagepassing iterative decoding. Understanding the behavior of graphcover decoding is tantamount to understanding
Simulationbased computation of information rates for channels with memory
 IEEE TRANS. INFORM. THEORY
, 2006
"... The information rate of finitestate source/channel models can be accurately estimated by sampling both a long channel input sequence and the corresponding channel output sequence, followed by a forward sum–product recursion on the joint source/channel trellis. This method is extended to compute up ..."
Abstract

Cited by 55 (11 self)
 Add to MetaCart
The information rate of finitestate source/channel models can be accurately estimated by sampling both a long channel input sequence and the corresponding channel output sequence, followed by a forward sum–product recursion on the joint source/channel trellis. This method is extended to compute upper and lower bounds on the information rate of very general channels with memory by means of finitestate approximations. Further upper and lower bounds can be computed by reducedstate methods.
Collaborative Multiagent Reinforcement Learning by Payoff Propagation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose t ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the singlestate case and describe a payoff propagation algorithm that computes the individual actions that approximately maximize the global payoff function. The method can be viewed as the decisionmaking analogue of belief propagation in Bayesian networks. Second, we focus on learning the behavior of the agents in sequential decisionmaking tasks. We introduce different modelfree reinforcementlearning techniques, unitedly called Sparse Cooperative Qlearning, which approximate the global actionvalue function based on the topology of a coordination graph, and perform updates using the contribution of the individual agents to the maximal global action value. The combined use of an edgebased decomposition of the actionvalue function and the payoff propagation algorithm for efficient action selection, result in an approach that scales only linearly in the problem size. We provide experimental evidence that our method outperforms related multiagent reinforcementlearning methods based on temporal differences.
LogDeterminant Relaxation for Approximate Inference in Discrete Markov Random Fields
, 2006
"... Graphical models are well suited to capture the complex and nonGaussian statistical dependencies that arise in many realworld signals. A fundamental problem common to any signal processing application of a graphical model is that of computing approximate marginal probabilities over subsets of nod ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
Graphical models are well suited to capture the complex and nonGaussian statistical dependencies that arise in many realworld signals. A fundamental problem common to any signal processing application of a graphical model is that of computing approximate marginal probabilities over subsets of nodes. This paper proposes a novel method, applicable to discretevalued Markov random fields (MRFs) on arbitrary graphs, for approximately solving this marginalization problem. The foundation of our method is a reformulation of the marginalization problem as the solution of a lowdimensional convex optimization problem over the marginal polytope. Exactly solving this problem for general graphs is intractable; for binary Markov random fields, we describe how to relax it by using a Gaussian bound on the discrete entropy and a semidefinite outer bound on the marginal polytope. This combination leads to a logdeterminant maximization problem that can be solved efficiently by interior point methods, thereby providing approximations to the exact marginals. We show how a slightly weakened logdeterminant relaxation can be solved even more efficiently by a dual reformulation. When applied to denoising problems in a coupled mixtureofGaussian model defined on a binary MRF with cycles, we find that the performance of this logdeterminant relaxation is comparable or superior to the widely used sumproduct algorithm over a range of experimental conditions.
On the Relationship between Linear Programming Decoding and MinSum Algorithm Decoding
, 2004
"... We are interested in the characterization of the decision regions when decoding a lowdensity paritycheck code with the minsum algorithm. Observations made in [1] and experimental evidence suggest that these decision regions are tightly related to the decision regions obtained when decoding the co ..."
Abstract

Cited by 25 (8 self)
 Add to MetaCart
We are interested in the characterization of the decision regions when decoding a lowdensity paritycheck code with the minsum algorithm. Observations made in [1] and experimental evidence suggest that these decision regions are tightly related to the decision regions obtained when decoding the code with the linear programming decoder. We introduce a family of quadratic programming decoders that aims at explaining this behavior. Moreover, we also point out connections to electrical networks.
R.Koetter, Towards LowComplexity LinearProgramming Decoding
 Proc. 4th Int. Symposium on Turbo Codes and Related Topics
"... ..."
A generalization of the BlahutArimoto algorithm to finitestate channels
 IEEE Trans. Inf. Theory
, 2008
"... Abstract—The classical Blahut–Arimoto algorithm (BAA) is a wellknown algorithm that optimizes a discrete memoryless source (DMS) at the input of a discrete memoryless channel (DMC) in order to maximize the mutual information between channel input and output. This paper considers the problem of opti ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Abstract—The classical Blahut–Arimoto algorithm (BAA) is a wellknown algorithm that optimizes a discrete memoryless source (DMS) at the input of a discrete memoryless channel (DMC) in order to maximize the mutual information between channel input and output. This paper considers the problem of optimizing finitestate machine sources (FSMSs) at the input of finitestate machine channels (FSMCs) in order to maximize the mutual information rate between channel input and output. Our main result is an algorithm that efficiently solves this problem numerically; thus, we call the proposed procedure the generalized BAA. It includes as special cases not only the classical BAA but also an algorithm that solves the problem of finding the capacityachieving input distribution for finitestate channels with no noise. While we present theorems that characterize the local behavior of the generalized BAA, there are still open questions
Measuring neural synchrony by message passing
 Advances in Neural Information Processing Systems 20 (NIPS
, 2007
"... A novel approach to measure the interdependence of two time series is proposed, referred to as “stochastic event synchrony ” (SES); it quantifies the alignment of two point processes by means of the following parameters: time delay, variance of the timing jitter, fraction of “spurious ” events, and ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
A novel approach to measure the interdependence of two time series is proposed, referred to as “stochastic event synchrony ” (SES); it quantifies the alignment of two point processes by means of the following parameters: time delay, variance of the timing jitter, fraction of “spurious ” events, and average similarity of events. SES may be applied to generic onedimensional and multidimensional point processes, however, the paper mainly focusses on point processes in timefrequency domain. The average event similarity is in that case described by two parameters: the average frequency offset between events in the timefrequency plane, and the variance of the frequency offset (“frequency jitter”); SES then consists of five parameters in total. Those parameters quantify the synchrony of oscillatory events, and hence, they provide an alternative to existing synchrony measures that quantify amplitude or phase synchrony. The pairwise alignment of point processes is cast as a statistical inference problem, which is solved by applying the maxproduct algorithm on a graphical model. The SES parameters are determined from the resulting pairwise alignment by maximum a posteriori (MAP) estimation. The proposed interdependence measure is applied to the problem of detecting anomalies in EEG synchrony of Mild Cognitive Impairment (MCI) patients; the results indicate that SES significantly improves the sensitivity of EEG in detecting MCI. 1