• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 872
Next 10 →

Mining Generalized Association Rules

by Ramakrishnan Srikant, Rakesh Agrawal , 1995
"... We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (is-a hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy th ..."
Abstract - Cited by 591 (7 self) - Add to MetaCart
and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one real-life dataset). We also present a new interes...

Mining Quantitative Association Rules in Large Relational Tables

by Ramakrishnan Srikant, Rakesh Agrawal , 1996
"... We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fi ..."
Abstract - Cited by 444 (3 self) - Add to MetaCart
"greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset. 1 Introduction Data mining, also known

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

by Peng Liu, Elia El-darzi, Lei Lei, Christos Vasilakis, Panagiotis Chountas, Wei Huang
"... Abstract. It is well accepted that many real-life datasets are full of missing data. In this paper we introduce, analyze and compare several well known treatment methods for missing data handling and propose new methods based on Naive Bayesian classifier to estimate and replace missing data. We cond ..."
Abstract - Add to MetaCart
Abstract. It is well accepted that many real-life datasets are full of missing data. In this paper we introduce, analyze and compare several well known treatment methods for missing data handling and propose new methods based on Naive Bayesian classifier to estimate and replace missing data. We

Abstract Exploratory Multilevel Hot Spot Analysis: Australian Taxation Office Case Study

by Graham J. Williams, Peter Christen
"... Population based real-life datasets often contain smaller clusters of unusual sub-populations. While these clusters, called ‘hot spots’, are small and sparse, they are usually of special interest to an analyst. In this paper we introduce a visual drill-down Self-Organizing Map (SOM)-based approach t ..."
Abstract - Add to MetaCart
Population based real-life datasets often contain smaller clusters of unusual sub-populations. While these clusters, called ‘hot spots’, are small and sparse, they are usually of special interest to an analyst. In this paper we introduce a visual drill-down Self-Organizing Map (SOM)-based approach

Being Bayesian about network structure

by Nir Friedman - Machine Learning , 2000
"... Abstract. In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model sel ..."
Abstract - Cited by 299 (3 self) - Add to MetaCart
is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap

Interactive Deduplication using Active Learning

by Sunita Sarawagi, Anuradha Bhamidipaty , 2002
"... Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer to the same entity in spite of various data inconsistencies. Most existing systems use hand-coded functions. One way to ov ..."
Abstract - Cited by 242 (5 self) - Add to MetaCart
experiments on real-life datasets show that active learning signi#12;cantly reduces the number of instances needed to achieve high accuracy. We investigate various design issues that arise in building a system to provide interactive response, fast convergence, and interpretable output.

Collaborative Filtering on Skewed Datasets

by Somnath Banerjee
"... Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets the state of the art collaborative filtering methods perform worse than a simple probabilistic model. Our test bench inc ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets the state of the art collaborative filtering methods perform worse than a simple probabilistic model. Our test bench

Learning Bayesian network structure from massive datasets: the “sparse candidate” algorithm

by Nir Friedman, Iftach Nachman - In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI , 1999
"... Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a sta-tistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the sear ..."
Abstract - Cited by 247 (7 self) - Add to MetaCart
candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures. 1

A Linear Method for Deviation Detection in Large Databases

by Andreas Arning, et al. , 1996
"... We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of t ..."
Abstract - Cited by 101 (1 self) - Add to MetaCart
results from the application of this algorithm on real-life datasets showing its effectiveness.

Improving Categorical Data Clustering Algorithm by Weighting Uncommon Attribute Value Matches

by Zengyou He, Xiaofei Xu, Shenchun Deng
"... Abstract. This paper presents an improved Squeezer algorithm for categorical data clustering by giving greater weight to uncommon attribute value matches in similarity computations. Experimental results on real life datasets show that, the modified algorithm is superior to the original Squeezer algo ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. This paper presents an improved Squeezer algorithm for categorical data clustering by giving greater weight to uncommon attribute value matches in similarity computations. Experimental results on real life datasets show that, the modified algorithm is superior to the original Squeezer
Next 10 →
Results 1 - 10 of 872
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University