Results 1 - 10
of
872
Mining Generalized Association Rules
, 1995
"... We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (is-a hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy th ..."
Abstract
-
Cited by 591 (7 self)
- Add to MetaCart
and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one real-life dataset). We also present a new interes...
Mining Quantitative Association Rules in Large Relational Tables
, 1996
"... We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fi ..."
Abstract
-
Cited by 444 (3 self)
- Add to MetaCart
"greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset. 1 Introduction Data mining, also known
An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
"... Abstract. It is well accepted that many real-life datasets are full of missing data. In this paper we introduce, analyze and compare several well known treatment methods for missing data handling and propose new methods based on Naive Bayesian classifier to estimate and replace missing data. We cond ..."
Abstract
- Add to MetaCart
Abstract. It is well accepted that many real-life datasets are full of missing data. In this paper we introduce, analyze and compare several well known treatment methods for missing data handling and propose new methods based on Naive Bayesian classifier to estimate and replace missing data. We
Abstract Exploratory Multilevel Hot Spot Analysis: Australian Taxation Office Case Study
"... Population based real-life datasets often contain smaller clusters of unusual sub-populations. While these clusters, called ‘hot spots’, are small and sparse, they are usually of special interest to an analyst. In this paper we introduce a visual drill-down Self-Organizing Map (SOM)-based approach t ..."
Abstract
- Add to MetaCart
Population based real-life datasets often contain smaller clusters of unusual sub-populations. While these clusters, called ‘hot spots’, are small and sparse, they are usually of special interest to an analyst. In this paper we introduce a visual drill-down Self-Organizing Map (SOM)-based approach
Being Bayesian about network structure
- Machine Learning
, 2000
"... Abstract. In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model sel ..."
Abstract
-
Cited by 299 (3 self)
- Add to MetaCart
is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap
Interactive Deduplication using Active Learning
, 2002
"... Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer to the same entity in spite of various data inconsistencies. Most existing systems use hand-coded functions. One way to ov ..."
Abstract
-
Cited by 242 (5 self)
- Add to MetaCart
experiments on real-life datasets show that active learning
signi#12;cantly reduces the number of instances needed to
achieve high accuracy. We investigate various design issues
that arise in building a system to provide interactive
response, fast convergence, and interpretable output.
Collaborative Filtering on Skewed Datasets
"... Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets the state of the art collaborative filtering methods perform worse than a simple probabilistic model. Our test bench inc ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets the state of the art collaborative filtering methods perform worse than a simple probabilistic model. Our test bench
Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm
- In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI
, 1999
"... Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a sta-tistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the sear ..."
Abstract
-
Cited by 247 (7 self)
- Add to MetaCart
candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures. 1
A Linear Method for Deviation Detection in Large Databases
, 1996
"... We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of t ..."
Abstract
-
Cited by 101 (1 self)
- Add to MetaCart
results from the application of this algorithm on real-life datasets showing its effectiveness.
Improving Categorical Data Clustering Algorithm by Weighting Uncommon Attribute Value Matches
"... Abstract. This paper presents an improved Squeezer algorithm for categorical data clustering by giving greater weight to uncommon attribute value matches in similarity computations. Experimental results on real life datasets show that, the modified algorithm is superior to the original Squeezer algo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper presents an improved Squeezer algorithm for categorical data clustering by giving greater weight to uncommon attribute value matches in similarity computations. Experimental results on real life datasets show that, the modified algorithm is superior to the original Squeezer
Results 1 - 10
of
872