Results 1  10
of
35
Item sets that compress
, 2006
"... One of the major problems in frequent item set mining is the explosion of the number of results: it is difficult to find the most interesting frequent item sets. The cause of this explosion is that large sets of frequent item sets describe essentially the same set of transactions. In this paper we a ..."
Abstract

Cited by 44 (23 self)
 Add to MetaCart
(Show Context)
One of the major problems in frequent item set mining is the explosion of the number of results: it is difficult to find the most interesting frequent item sets. The cause of this explosion is that large sets of frequent item sets describe essentially the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of frequent item sets is that set that compresses the database best. We introduce four heuristic algorithms for this task, and the experiments show that these algorithms give a dramatic reduction in the number of frequent item sets. Moreover, we show how our approach can be used to determine the best value for the minsup threshold.
Learning shapeclasses using a mixture of treeunions
 IEEE Trans. PAMI
, 2006
"... Abstract—This paper poses the problem of treeclustering as that of fitting a mixture of tree unions to a set of sample trees. The treeunions are structures from which the individual data samples belonging to a cluster can be obtained by edit operations. The distribution of observed tree nodes in ea ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper poses the problem of treeclustering as that of fitting a mixture of tree unions to a set of sample trees. The treeunions are structures from which the individual data samples belonging to a cluster can be obtained by edit operations. The distribution of observed tree nodes in each cluster sample is assumed to be governed by a Bernoulli distribution. The clustering method is designed to operate when the correspondences between nodes are unknown and must be inferred as part of the learning process. We adopt a minimum description length approach to the problem of fitting the mixture model to data. We make maximumlikelihood estimates of the Bernoulli parameters. The treeunions and the mixing proportions are sought so as to minimize the description length criterion. This is the sum of the negative logarithm of the Bernoulli distribution, and a messagelength criterion that encodes both the complexity of the uniontrees and the number of mixture components. We locate node correspondences by minimizing the edit distance with the current tree unions, and show that the edit distance is linked to the description length criterion. The method can be applied to both unweighted and weighted trees. We illustrate the utility of the resulting algorithm on the problem of classifying 2D shapes using a shock graph representation. Index Terms—Structural learning, tree clustering, mixture modelinq, minimum description length, model codes, shock graphs. 1
Model Selection by Normalized Maximum Likelihood
, 2005
"... The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
The Minimum Description Length (MDL) principle is an information theoretic approach to inductive inference that originated in algorithmic coding theory. In this approach, data are viewed as codes to be compressed by the model. From this perspective, models are compared on their ability to compress a data set by extracting useful information in the data apart from random noise. The goal of model selection is to identify the model, from a set of candidate models, that permits the shortest description length (code) of the data. Since Rissanen originally formalized the problem using the crude ‘twopart code ’ MDL method in the 1970s, many significant strides have been made, especially in the 1990s, with the culmination of the development of the refined ‘universal code’ MDL method, dubbed Normalized Maximum Likelihood (NML). It represents an elegant solution to the model selection problem. The present paper provides a tutorial review on these latest developments with a special focus on NML. An application example of NML in cognitive modeling is also provided.
Characterising the difference
 In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2007
"... Characterising the differences between two databases is an often occurring problem in Data Mining. Detection of change over time is a prime example, comparing databases from two branches is another one. The key problem is to discover the patterns that describe the difference. Emerging patterns provi ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
(Show Context)
Characterising the differences between two databases is an often occurring problem in Data Mining. Detection of change over time is a prime example, comparing databases from two branches is another one. The key problem is to discover the patterns that describe the difference. Emerging patterns provide only a partial answer to this question. In previous work, we showed that the data distribution can be captured in a patternbased model using compression [12]. Here, we extend this approach to define a generic dissimilarity measure on databases. Moreover, we show that this approach can identify those patterns that characterise the differences between two distributions. Experimental results show that our method provides a wellfounded way to independently measure database dissimilarity that allows for thorough inspection of the actual differences. This illustrates the use of our approach in real world data mining.
Robust inference with simple cognitive models. In C. Lebiere & R. Wray (Eds.), AAAI spring symposium: Cognitive science principles meet AIhard problems (pp. 17–22
 In
, 2006
"... Developing theories of how information is processed to yield inductive inferences is a key step in understanding intelligence in humans and machines. Humans, across tasks as diverse as vision and decision making, appear to be extremely adaptive and successful in dealing with uncertainty in the world ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Developing theories of how information is processed to yield inductive inferences is a key step in understanding intelligence in humans and machines. Humans, across tasks as diverse as vision and decision making, appear to be extremely adaptive and successful in dealing with uncertainty in the world. Yet even a cursory examination of the books and journals covering machine learning reveals that this branch of AI rarely draws on the cognitive system as a source of insight. In this article I show how fast and frugal heuristics – cognitive process models of inductive inference – frequently outperform a wide selection of standard machine learning algorithms. This finding suggests a cognitiveinspired route toward robust inference in the context of metalearning.
PSIBLAST pseudocounts and the minimum description length principle
 Nucleic Acids Res
, 2009
"... length principle ..."
(Show Context)
LowEntropy Set Selection
"... Most pattern discovery algorithms easily generate very large numbers of patterns, making the results impossible to understand and hard to use. Recently, the problem of instead selecting a small subset of informative patterns from a large collection of patterns has attracted a lot of interest. In thi ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Most pattern discovery algorithms easily generate very large numbers of patterns, making the results impossible to understand and hard to use. Recently, the problem of instead selecting a small subset of informative patterns from a large collection of patterns has attracted a lot of interest. In this paper we present a succinct way of representing data on the basis of itemsets that identify strong interactions. This new approach, LESS, provides a more powerful and more general technique to data description than existing approaches. Lowentropy sets consider the data symmetrically and as such identify strong interactions between attributes, not just between items that are present. Selection of these patterns is executed through the MDLcriterion. This results in only a handful of sets that together form a compact lossless description of the data. By using entropybased elements for the data description, we can successfully apply the maximum likelihood principle to locally cover the data optimally. Further, it allows for a fast, natural and well performing heuristic. Based on these approaches we present two algorithms that provide highquality descriptions of the data in terms of strongly interacting variables. Experiments on these methods show that highquality results are mined: very small pattern sets are returned that are easily interpretable and understandable descriptions of the data, and can be straightforwardly visualized. Swap randomization experiments and high compression ratios show that they capture the structure of the data well.
Calculating the normalized maximum likelihood distribution for Bayesian forests
 in Proc. IADIS International Conference on Intelligent Systems and Agents
, 2007
"... When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained by integrating over a prior distribution fo ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained by integrating over a prior distribution for the model parameters. However, the problem of determining a reasonable prior for the parameters is a highly controversial issue, and no completely satisfying Bayesian solution has yet been presented in the noninformative setting. The normalized maximum likelihood (NML), based on Rissanen’s informationtheoretic MDL methodology, offers an alternative, theoretically solid criterion that is objective and noninformative, while no parameter prior is required. It has been previously shown that for discrete data, this criterion can be computed in linear time for Bayesian networks with no arcs, and in quadratic time for the so called Naive Bayes network structure. Here we extend the previous results by showing how to compute the NML criterion in polynomial time for treestructured Bayesian networks. The order of the polynomial depends on the number of values of the variables, but neither on the number of variables itself, nor on the sample size.
Filling in the blanks  Krimp minimisation for missing data
 In Proceedings of ICDM’08
, 2008
"... Many data sets are incomplete. For correct analysis of such data, one can either use algorithms that are designed to handle missing data or use imputation. Imputation has the benefit that it allows for any type of data analysis. Obviously, this can only lead to proper conclusions if the provided dat ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
(Show Context)
Many data sets are incomplete. For correct analysis of such data, one can either use algorithms that are designed to handle missing data or use imputation. Imputation has the benefit that it allows for any type of data analysis. Obviously, this can only lead to proper conclusions if the provided data completion is both highly accurate and maintains all statistics of the original data. In this paper, we present three data completion methods that are built on the MDLbased KRIMP algorithm. Here, we also follow the MDL principle, i.e. the completed database that can be compressed best, is the best completion because it adheres best to the patterns in the data. By using local patterns, as opposed to a global model, KRIMP captures the structure of the data in detail. Experiments show that both in terms of accuracy and expected differences of any marginal, better data reconstructions are provided than the state of the art, Structural EM. 1
Streamkrimp: Detecting change in data streams
 In ECML PKDD
, 2008
"... Abstract. Data streams are ubiquitous. Examples range from sensor networks to financial transactions and website logs. In fact, even market basket data can be seen as a stream of sales. Detecting changes in the distribution a stream is sampled from is one of the most challenging problems in stream m ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Data streams are ubiquitous. Examples range from sensor networks to financial transactions and website logs. In fact, even market basket data can be seen as a stream of sales. Detecting changes in the distribution a stream is sampled from is one of the most challenging problems in stream mining, as only limited storage can be used. In this paper we analyse this problem for streams of transaction data from an MDL perspective. Based on this analysis we introduce the STREAMKRIMP algorithm, which uses the KRIMP algorithm to characterise probability distributions with code tables. With these code tables, STREAMKRIMP partitions the stream into a sequence of substreams. Each switch of code table indicates a change in the underlying distribution. Experiments on both real and artificial streams show that STREAMKRIMP detects the changes while using only a very limited amount of data storage. 1