Results 1  10
of
18
Mining quantitative correlated patterns using an informationtheoretic approach
 In KDD
, 2006
"... Existing research on mining quantitative databases mainly focuses on mining associations. However, mining associations is too expensive to be practical in many cases. In this paper, we study mining correlations from quantitative databases and show that it is a more effective approach than mining ass ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Existing research on mining quantitative databases mainly focuses on mining associations. However, mining associations is too expensive to be practical in many cases. In this paper, we study mining correlations from quantitative databases and show that it is a more effective approach than mining associations. We propose a new notion of Quantitative Correlated Patterns (QCPs), which is founded on two formal concepts, mutual information and allconfidence. We first devise a normalization on mutual information and apply it to QCP mining to capture the dependency between the attributes. We further adopt allconfidence as a quality measure to control, at a finer granularity, the dependency between the attributes with specific quantitative intervals. We also propose a supervised method to combine the consecutive intervals of the quantitative attributes based on mutual information, such that the interval combining is guided by the dependency between the attributes. We develop an algorithm, QCoMine, to efficiently mine QCPs by utilizing normalized mutual information and allconfidence to perform a twolevel pruning. Our experiments verify the efficiency of QCoMine and the quality of the QCPs.
Maximum Independent Set of Rectangles
"... We study the Maximum Independent Set of Rectangles (MISR) problem: given a collection R of n axisparallel rectangles, find a maximumcardinality subset of disjoint rectangles. MISR is a special case of the classical Maximum Independent Set problem, where the input is restricted to intersection grap ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We study the Maximum Independent Set of Rectangles (MISR) problem: given a collection R of n axisparallel rectangles, find a maximumcardinality subset of disjoint rectangles. MISR is a special case of the classical Maximum Independent Set problem, where the input is restricted to intersection graphs of axisparallel rectangles. Due to its many applications, ranging from map labeling to data mining, MISR has received a significant amount of attention from various research communities. Since the problem is NPhard, the main focus has been on the design of approximation algorithms. Several groups of researches have independently suggested O(log n)approximation algorithms for MISR, and this remained the best currently known approximation factor for the problem. The main result of our paper is an O(log log n)approximation algorithm for MISR. Our algorithm combines existing approaches for solving special cases of the problem, in which the input set of rectangles is restricted to containing specific intersection types, with new insights into the combinatorial structure of sets of intersecting rectangles in the plane. We also consider a generalization of MISR to higher dimensions, where rectangles are replaced by ddimensional hyperrectangles. Our results for MISR imply an O((log n) d−2 log log n)approximation algorithm for this problem, improving upon the best previously known O((log n) d−1)approximation.
Exploratory Visualization for Association Rule Rummaging
 KDD03 Workshop on Multimedia Data Mining (MDM03
, 2003
"... On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge validation is one of the most problematic steps in an association rule discovery process. In order to comprehend this bulk of rules and to find relevant knowledge for decisionmaking, the user needs ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge validation is one of the most problematic steps in an association rule discovery process. In order to comprehend this bulk of rules and to find relevant knowledge for decisionmaking, the user needs to really rummage through the rules. Visualization can be very beneficial to support him/her in this task by improving the intelligibility of the large rule sets and enabling the user to carry out his/her navigation inside them. In this article, we propose to answer the association rule validation problem by designing a visualization for the rule rummaging task. This new approach based on a specific rummaging model relies on interactive rule focusing and on rule quality measures. A first prototype implementing our representation has been developed. It allows the user to appropriate the interesting rules by navigating through a voluminous rule set by trial and error via successive limited subsets.
A Linear Time Algorithm for the k Maximal Sums Problem
"... Abstract. Finding the subvector with the largest sum in a sequence of n numbers is known as the maximum sum problem. Finding the k subvectors with the largest sums is a natural extension of this, and is known as the k maximal sums problem. In this paper we design an optimal O(n+k) time algorithm f ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Abstract. Finding the subvector with the largest sum in a sequence of n numbers is known as the maximum sum problem. Finding the k subvectors with the largest sums is a natural extension of this, and is known as the k maximal sums problem. In this paper we design an optimal O(n+k) time algorithm for the k maximal sums problem. We use this algorithm to obtain algorithms solving the twodimensional k maximal sums problem in O(m 2 ·n+k) time, where the input is an m ×n matrix with m ≤ n. We generalize this algorithm to solve the ddimensional problem in O(n 2d−1 +k) time. The space usage of all the algorithms can be reduced to O(n d−1 + k). This leads to the first algorithm for the k maximal sums problem in one dimension using O(n + k) time and O(k) space. 1
Using Hierarchical Data Mining to Characterize Performance of Wireless System Configurations
, 2002
"... This paper presents a statistical framework for assessing wireless systems performance using hierarchical data mining techniques. We consider WCDMA (wideband code division multiple access) systems with twobranch STTD (space time transmit diversity) and 1/2 rate convolutional coding (forward error c ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents a statistical framework for assessing wireless systems performance using hierarchical data mining techniques. We consider WCDMA (wideband code division multiple access) systems with twobranch STTD (space time transmit diversity) and 1/2 rate convolutional coding (forward error correction codes). Monte Carlo simulation estimates the bit error probability (BEP) of the system across a wide range of signaltonoise ratios (SNRs). A performance database of simulation runs is collected over a targeted space of system configurations. This database is then mined to obtain regions of the configuration space that exhibit acceptable average performance. The shape of the mined regions illustrates the joint influence of configuration parameters on system performance. The role of data mining in this application is to provide explainable and statistically valid design conclusions. The research issue is to define statistically meaningful aggregation of data in a manner that permits efficient and effective data mining algorithms. We achieve a good compromise between these goals and help establish the applicability of data mining for characterizing wireless systems performance. 1
An InformationTheoretic Approach to Quantitative Association Rule Mining
, 2007
"... Quantitative Association Rule (QAR) mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike Boolean Association Rules (BARs), which only consider boolean attributes, Q ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Quantitative Association Rule (QAR) mining has been recognized an influential research problem over the last decade due to the popularity of quantitative databases and the usefulness of association rules in real life. Unlike Boolean Association Rules (BARs), which only consider boolean attributes, QARs consist of quantitative attributes which contain much richer information than the boolean attributes. However, the combination of these quantitative attributes and their value intervals always gives rise to the generation of an explosively large number of itemsets, thereby severely degrading the mining efficiency. In this paper, we propose an informationtheoretic approach to avoid unrewarding combinations of both the attributes and their value intervals being generated in the mining process. We study the mutual information between the attributes in a quantitative database and devise a normalization on the mutual information to make it applicable in the context of QAR mining. To indicate the strong informative relationships among the
R.: Spacetime tradeoffs for stackbased algorithms. Eprint arXiv:1208.3663
, 2012
"... In memoryconstrained algorithms we have readonly access to the input, and the number of additional variables is limited. In this paper we introduce the compressed stack technique, a method that allows to transform algorithms whose space bottleneck is a stack into memoryconstrained algorithms. Give ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In memoryconstrained algorithms we have readonly access to the input, and the number of additional variables is limited. In this paper we introduce the compressed stack technique, a method that allows to transform algorithms whose space bottleneck is a stack into memoryconstrained algorithms. Given an algorithm A that runs in O(n) time using a stack of length Θ(n), we can modify it so that it runs in O(n 2 /2 s) time using a workspace of O(s) variables (for any s ∈ o(log n)) or O(n log n / log p) time using O(p log n / log p) variables (for any 2 ≤ p ≤ n). We also show how the technique can be applied to solve various geometric problems, namely computing the convex hull of a simple polygon, a triangulation of a monotone polygon, the shortest path between two points inside a monotone polygon, 1dimensional pyramid approximation of a 1dimensional vector, and the visibility profile of a point inside a simple polygon. Our approach exceeds or matches the bestknown results for these problems in constantworkspace models (when they exist), and gives a tradeoff between the size of the workspace and running time. To the best of our knowledge, this is the first general framework for obtaining memoryconstrained algorithms.
Data and Computation Modeling for Scientific Problem Solving Environments
, 2002
"... This thesis investigates several issues in data and computation modeling for scientific problem solving environments (PSEs). A PSE is viewed as a software system that provides (i) a library of simulation components, (ii) experiment management, (ii) reasoning about simulations and data, and (iv) prob ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This thesis investigates several issues in data and computation modeling for scientific problem solving environments (PSEs). A PSE is viewed as a software system that provides (i) a library of simulation components, (ii) experiment management, (ii) reasoning about simulations and data, and (iv) problem solving abstractions. Three specific ideas, in functionalities (ii)(iv), form the contributions of this thesis. These include the EMDAG system for experiment management, the BSML markup language for data interchange, and the use of data mining for conducting nontrivial parameter studies. This work emphasizes data modeling and management, two important aspects that have been largely neglected in modern PSE research. All studies are performed in the context of S 4 W, a sophisticated PSE for wireless system design.