Results 1  10
of
16
A Statistical Theory for Quantitative Association Rules
 Journal of Intelligent Information Systems
, 1999
"... Association rules are a key datamining tool and as such have been well researched. So far, this research has focused predominantly on databases containing categorical data only. However, many realworld databases contain quantitative attributes and current solutions for this case are so far inad ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
Association rules are a key datamining tool and as such have been well researched. So far, this research has focused predominantly on databases containing categorical data only. However, many realworld databases contain quantitative attributes and current solutions for this case are so far inadequate. We introduce a new definition of quantitative association rules based on statistical inference theory. Our definition reflects the intuition that the goal of association rules is to find extraordinary and therefore interesting phenomena in databases. We also introduce the concept of subrules which can be applied to any type of association rule. Rigorous experimental evaluation on realworld datasets is presented, demonstrating the usefulness and characteristics of rules mined according to our definition. 1 Introduction Association Rules. The goal of data mining is to extract higher level information from an abundance of raw data. Association rules are a key tool used for this...
On Classification and Regression
 Discovery Science
, 1998
"... We address the problem of computing various types of expressive tests for decision tress and regression trees. Using expressive tests is promising, because it may improve the prediction accuracy of trees. The drawback is that computing an optimal test could be costly. We present a unified framework ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
We address the problem of computing various types of expressive tests for decision tress and regression trees. Using expressive tests is promising, because it may improve the prediction accuracy of trees. The drawback is that computing an optimal test could be costly. We present a unified framework to approach this problem, and we revisit the design of efficient algorithms for computing important special cases. We also prove that it is intractable to compute an optimal conjunction or disjunction.
Multivariate Discretization for Set Mining
 KNOWLEDGE AND INFORMATION SYSTEMS
, 2000
"... Many algorithms in data mining can be formulated as a set mining problem where the goal is to find conjunctions (or disjunctions) of terms that meet user specified constraints. Set mining techniques have been largely designed for categorical or discrete data where variables can only take on a fixed ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Many algorithms in data mining can be formulated as a set mining problem where the goal is to find conjunctions (or disjunctions) of terms that meet user specified constraints. Set mining techniques have been largely designed for categorical or discrete data where variables can only take on a fixed number of values. However, many data sets also contain continuous variables and a common method of dealing with these is to discretize them by breaking them into ranges. Most discretization methods are univariate and consider only a single feature at a time (sometimes in conjunction with a class variable). We argue that this is a suboptimal approach for knowledge discovery as univariate discretization can destroy hidden patterns in data. Discretization should consider the effects on all variables in the analysis and that two regions X and Y should only be in the same interval after discretization if the instances in those regions have similar multivariate distributions (Fx Fy) across all variables and combinations of variables. We present a bottom up merging algorithm to discretize continuous variables based on this rule. Our experiments indicate that the approach is feasible, that it will not destroy hidden patterns and that it will generate meaningful intervals.
On the Complexity of Mining Quantitative Association Rules
, 1998
"... The discovery of quantitative association rules in large databases is considered an interesting and important research problem. Recently, different aspects of the problem have been studied, and several algorithms have been presented in the literature, among others in (Srikant and Agrawal, 1996# Fuku ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
The discovery of quantitative association rules in large databases is considered an interesting and important research problem. Recently, different aspects of the problem have been studied, and several algorithms have been presented in the literature, among others in (Srikant and Agrawal, 1996# Fukuda et al., 1996a# Fukuda et al., 1996b# Yoda et al., 1997# Miller and Yang, 1997). An aspect of the problem that has so far been ignored, is its computational complexity.In this paper, we study the computational complexity of mining quantitative association rules.
Computing Optimal Hypotheses Efficiently for Boosting
"... This paper sheds light on a strong connection between AdaBoost and several optimization algorithms for data mining. AdaBoost has been the subject of much interests as an effective methodology for classification task. AdaBoost repeatedly generates one hypothesis in each round, and finally it is able ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
This paper sheds light on a strong connection between AdaBoost and several optimization algorithms for data mining. AdaBoost has been the subject of much interests as an effective methodology for classification task. AdaBoost repeatedly generates one hypothesis in each round, and finally it is able to make a highly accurate prediction by taking a weighted majority vote on the resulting hypotheses. Freund and Schapire have remarked that the use of simple hypotheses such as singletest decision trees instead of huge trees would be promising for achieving high accuracy and avoiding overfitting to the training data. One major drawback of this approach however is that accuracies of simple individual hypotheses may not always be high, hence demanding a way of computing more accurate (or, the most accurate) simple hypotheses efficiently.
Data Mining: Concepts and
, 2001
"... Rule: Basic Concepts n Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) n Find: all rules that correlate the presence of one set of items with that of another set of items n E.g., 98% of people who purchase tires and auto accessorie ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Rule: Basic Concepts n Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) n Find: all rules that correlate the presence of one set of items with that of another set of items n E.g., 98% of people who purchase tires and auto accessories also get automotive services done n Applications n * Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) n Home Electronics * (What other products should the store stocks up?) n Attached mailing in direct marketing n Detecting pingpong ing of patients, faulty collisions January 17, 2001 Data Mining: Concepts and Techniques 5 Rule Measures: Support and Confidence n Find all the rules X & Y Z with minimum confide
Efficient Mining of Association Rules
 in Text Databases” CIKM'99
, 1999
"... • Association rule mining • Mining singledimensional Boolean association rules from transactional databases • Mining multilevel association rules from transactional databases • Mining multidimensional association rules from transactional databases and data warehouse • From association mining to cor ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
• Association rule mining • Mining singledimensional Boolean association rules from transactional databases • Mining multilevel association rules from transactional databases • Mining multidimensional association rules from transactional databases and data warehouse • From association mining to correlation analysis • Constraintbased association mining
Using Hierarchical Data Mining to Characterize Performance of Wireless System Configurations
, 2002
"... This paper presents a statistical framework for assessing wireless systems performance using hierarchical data mining techniques. We consider WCDMA (wideband code division multiple access) systems with twobranch STTD (space time transmit diversity) and 1/2 rate convolutional coding (forward error c ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents a statistical framework for assessing wireless systems performance using hierarchical data mining techniques. We consider WCDMA (wideband code division multiple access) systems with twobranch STTD (space time transmit diversity) and 1/2 rate convolutional coding (forward error correction codes). Monte Carlo simulation estimates the bit error probability (BEP) of the system across a wide range of signaltonoise ratios (SNRs). A performance database of simulation runs is collected over a targeted space of system configurations. This database is then mined to obtain regions of the configuration space that exhibit acceptable average performance. The shape of the mined regions illustrates the joint influence of configuration parameters on system performance. The role of data mining in this application is to provide explainable and statistically valid design conclusions. The research issue is to define statistically meaningful aggregation of data in a manner that permits efficient and effective data mining algorithms. We achieve a good compromise between these goals and help establish the applicability of data mining for characterizing wireless systems performance. 1
A Theory of Quantitative Association Rules with Statistical Validation
 Proceedings of SIGKDD Conference
, 1998
"... The goal of data mining is to discover knowledge and reveal new, interesting and previously unknown information to the user. A central data mining tool is association rules. For events X and Y, an association rule is a rule of the type X Þ Y, with a certain probability. Classical use of association ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The goal of data mining is to discover knowledge and reveal new, interesting and previously unknown information to the user. A central data mining tool is association rules. For events X and Y, an association rule is a rule of the type X Þ Y, with a certain probability. Classical use of association rules is with marketbasket data resulting in rules such as "70% of people who buy beer also buy diapers". Association rules discover patterns and correlations that may be buried deep inside a database. They have therefore become a key datamining tool and as such have been well researched. This research has focused mainly on the case of databases containing only categorical attributes. However, most realworld databases contain many quantitative attributes and current solutions for this case are so far inadequate. A satisfactory solution would be of great benefit to many fields, an example of one being medical research. We introduce a new definition of quantitative association rules based ...
Data and Computation Modeling for Scientific Problem Solving Environments
, 2002
"... This thesis investigates several issues in data and computation modeling for scientific problem solving environments (PSEs). A PSE is viewed as a software system that provides (i) a library of simulation components, (ii) experiment management, (ii) reasoning about simulations and data, and (iv) prob ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This thesis investigates several issues in data and computation modeling for scientific problem solving environments (PSEs). A PSE is viewed as a software system that provides (i) a library of simulation components, (ii) experiment management, (ii) reasoning about simulations and data, and (iv) problem solving abstractions. Three specific ideas, in functionalities (ii)(iv), form the contributions of this thesis. These include the EMDAG system for experiment management, the BSML markup language for data interchange, and the use of data mining for conducting nontrivial parameter studies. This work emphasizes data modeling and management, two important aspects that have been largely neglected in modern PSE research. All studies are performed in the context of S 4 W, a sophisticated PSE for wireless system design.