Results

**1 - 3**of**3**### A Framework for Evaluating the Smoothness of Data-Mining Results

"... Abstract. The data-mining literature is rich in problems that are formalized as combinatorial-optimization problems. An indicative example is the entity-selection formulation that has been used to model the problem of selecting a subset of representative reviews from a review corpus [11, 22] or impo ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. The data-mining literature is rich in problems that are formalized as combinatorial-optimization problems. An indicative example is the entity-selection formulation that has been used to model the problem of selecting a subset of representative reviews from a review corpus [11, 22] or important nodes in a social network [10]. Existing combinatorial algorithms for solving such entity-selection problems identify a set of entities (e.g., reviews or nodes) as important. Here, we consider the following question: how do small or large changes in the input dataset change the value or the structure of the such reported solutions? We answer this question by developing a general framework for evaluating the smoothness (i.e, consistency) of the data-mining results obtained for the input dataset X. We do so by comparing these results with the results obtained for datasets that are within a small or a large distance from X. The algorithms we design allow us to perform such comparisons effectively and thus, approximate the results ’ smoothness efficiently. Our experimental evaluation on real datasets demonstrates the efficacy and the practical utility of our framework in a wide range of applications. 1

### Kleanthis-Nikolaos

"... This paper suggests a framework for mining subjectively interesting pattern sets that is based on two components: (1) the encoding of prior information in a model for the data miner’s state of mind; (2) the search for a pattern set that is maximally informative while efficient to convey to the data ..."

Abstract
- Add to MetaCart

(Show Context)
This paper suggests a framework for mining subjectively interesting pattern sets that is based on two components: (1) the encoding of prior information in a model for the data miner’s state of mind; (2) the search for a pattern set that is maximally informative while efficient to convey to the data miner. We illustrate the framework with an instantiation for tile patterns in binary databases where prior information on the row and column marginals is available. This approach implements step (1) above by constructing the MaxEnt model with respect to the prior information [2, 3], and step (2) by relying on concepts from information and coding theory. We provide a brief overview of a number of possible extensions and future research challenges, including a key challenge related to the design of empirical evaluations for subjective interestingness measures.