Results 11  20
of
131
Feature dynamic Bayesian networks
 In AGI
, 2009
"... Feature Markov Decision Processes (ΦMDPs) [Hut09] are wellsuited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for largescale realworld problems. In this ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Feature Markov Decision Processes (ΦMDPs) [Hut09] are wellsuited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for largescale realworld problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best ” DBN representation. I discuss all building blocks required for a complete general learning algorithm.
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
Approximating ratedistortion graphs of individual data
 Experiments in lossy compression and denoising,” IEEE Trans. Comput., Submitted. Also: Arxiv preprint cs.IT/0609121
, 2006
"... Abstract—Classical ratedistortion theory requires specifying a source distribution. Instead, we analyze ratedistortion properties of individual objects using the recently developed algorithmic ratedistortion theory. The latter is based on the noncomputable notion of Kolmogorov complexity. To appl ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Abstract—Classical ratedistortion theory requires specifying a source distribution. Instead, we analyze ratedistortion properties of individual objects using the recently developed algorithmic ratedistortion theory. The latter is based on the noncomputable notion of Kolmogorov complexity. To apply the theory we approximate the Kolmogorov complexity by standard data compression techniques, and perform a number of experiments with lossy compression and denoising of objects from different domains. We also introduce a natural generalization to lossy compression with side information. To maintain full generality we need to address a difficult searching problem. While our solutions are therefore not time efficient, we do observe good denoising and compression performance. Index Terms—Compression, denoising, ratedistortion, structure function, Kolmogorov complexity. Ç
Finding good itemsets by packing data
 In ICDM
, 2008
"... The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. Mo ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just cooccurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data. 1
SelfCorrelating Predictive Information Tracking for LargeScale Production Systems
 in Proc. of ICAC
, 2009
"... Automatic management of largescale production systems requires a continuous monitoring service to keep track of the states of the managed system. However, it is challenging to achieve both scalability and high information precision while continuously monitoring a large amount of distributed and tim ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Automatic management of largescale production systems requires a continuous monitoring service to keep track of the states of the managed system. However, it is challenging to achieve both scalability and high information precision while continuously monitoring a large amount of distributed and timevarying metrics in largescale production systems. In this paper, we present a new selfcorrelating, predictive information tracking system called InfoTrack, which employs lightweight temporal and spatial correlation discovery methods to minimize continuous monitoring cost. InfoTrack combines both metric value prediction within individual nodes and adaptive clustering among distributed nodes to suppress remote information update in distributed system monitoring. We have implemented a prototype of the InfoTrack system and deployed the system on the PlanetLab. We evaluated the performance of the InfoTrack system using both real system traces and microbenchmark prototype experiments. The experimental results show that InfoTrack can reduce the continuous monitoring cost by 5090 % while maintaining high information precision (i.e., within 0.010.05 error bound).
Bayesian Network Structure Learning using Factorized NML Universal Models
, 2008
"... Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very deman ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Universal codes/models can be used for data compression and model selection by the minimum description length (MDL) principle. For many interesting model classes, such as Bayesian networks, the minimax regret optimal normalized maximum likelihood (NML) universal model is computationally very demanding. We suggest a computationally feasible alternative to NML for Bayesian networks, the factorized NML universal model, where the normalization is done locally for each variable. This can be seen as an approximate sumproduct algorithm. We show that this new universal model performs extremely well in model selection, compared to the existing stateoftheart, even for small sample sizes.
Directly Mining Descriptive Patterns
 SIAM SDM
, 2012
"... Mining small, useful, and highquality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by postprocessin ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Mining small, useful, and highquality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by postprocessing it is virtually impossible to analyse dense or large databases in any detail. We introduce Slim, an anytime algorithm for mining highquality sets of itemsets directly from data. We use MDL to identify the best set of itemsets as that set that describes the data best. To approximate this optimum, we iteratively use the current solution to determine what itemset would provide most gain— estimating quality using an accurate heuristic. Without requiring a premined candidate collection, Slim is parameterfree in both theory and practice. Experiments show we mine highquality pattern sets; while evaluating ordersofmagnitude fewer candidates than our closest competitor, Krimp, we obtain much better compression ratios—closely approximating the locallyoptimal strategy. Classification experiments independently verify we characterise data very well. 1
Incremental Learning of System Log Formats
"... System logs come in a large and evolving variety of formats, many of which are semistructured and/or nonstandard. As a consequence, offtheshelf tools for processing such logs often do not exist, forcing analysts to develop their own tools, which is costly and timeconsuming. In this paper, we pr ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
System logs come in a large and evolving variety of formats, many of which are semistructured and/or nonstandard. As a consequence, offtheshelf tools for processing such logs often do not exist, forcing analysts to develop their own tools, which is costly and timeconsuming. In this paper, we present an incremental algorithm that automatically infers the format of system log files. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle largescale data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions. 1
Feature Markov Decision Processes
"... General purpose intelligent learning agents cycle through (complex,nonMDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is welldeveloped for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
General purpose intelligent learning agents cycle through (complex,nonMDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is welldeveloped for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in the companion article [Hut09].