Results 1 - 10
of
24
Text Mining with Information Extraction
- AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases
, 2002
"... The popularity of the Web and the large number of documents available in electronic form has motivated the search for hidden knowledge in text collections. Consequently, there is growing research interest in the general topic of text mining. In this paper, we develop a text-mining system by integrat ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
The popularity of the Web and the large number of documents available in electronic form has motivated the search for hidden knowledge in text collections. Consequently, there is growing research interest in the general topic of text mining. In this paper, we develop a text-mining system by integrating methods from Information Extraction (IE) and Data Mining (Knowledge Discovery from Databases or KDD). By utilizing existing IE and KDD techniques, text-mining systems can be developed relatively rapidly and evaluated on existing text corpora for testing IE systems. We present a general text-mining framework called DiscoTEX which employs an IE module for transforming natural-language documents into structured data and a KDD module for discovering prediction rules from the extracted data. When discovering patterns in extracted text, strict matching of strings is inadequate because textual database entries generally exhibit variations due to typographical errors, misspellings, abbreviations, and other
Mining Positive and Negative Association Rules: An Approach for Confined Rules
- IN EUROPEAN PKDD CONF
, 2004
"... Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are usef ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e. absent from transactions). Negative association rules are useful in market-basket analysis to identify products that conflict with each other or products that complement each other. They are also very convenient for associative classifiers, classifiers that build their classification model based on association rules. Many other applications would benefit from negative association rules if it was not for the expensive process to discover them. Indeed, mining for such rules necessitates the examination of an exponentially large search space. Despite their usefulness, and while they were referred to in many publications, very few algorithms to mine them have been proposed to date. In this paper we propose an algorithm that extends the support-confidence framework with a sliding correlation coe#cient threshold. In addition to finding confident positive rules that have a strong correlation, the algorithm discovers negative association rules with strong negative correlation between the antecedents and consequents.
Mining Large Itemsets for Association Rules
- Bulletin of the IEEE Computer Society Technical Comittee on Data Engineering
, 1998
"... This paper provides a survey of the itemset method for association rule generation. The paper discusses past research on the topic and also studies the relevance and importance of the itemset method in generating association rules. We discuss a number of variations of the association rule problem wh ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper provides a survey of the itemset method for association rule generation. The paper discusses past research on the topic and also studies the relevance and importance of the itemset method in generating association rules. We discuss a number of variations of the association rule problem which have been proposed in the literature and their practical applications. Some inherent weaknesses of the large itemset method for association rule generation have been explored. We also discuss some other formulations of associations which can be viable alternatives to the traditional association rule generation method. 1 Introduction Association rules find the relationships between the different items in a database of sales transactions. Such rules track the buying patterns in consumer behavior eg. finding how the presence of one item in the transaction affects the presence of another and so forth. The problem of association rule generation has recently gained considerable prominence in ...
2007, Association mining in large databases: A re-examination of its measures
- Mladenic and A. Skowron (eds), Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases
"... Abstract. In the literature of data mining and statistics, numerous interestingness measures have been proposed to disclose succinct object relationships of association patterns. However, it is still not clear when a measure is truly effective in large data sets. Recent studies have identified a cri ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. In the literature of data mining and statistics, numerous interestingness measures have been proposed to disclose succinct object relationships of association patterns. However, it is still not clear when a measure is truly effective in large data sets. Recent studies have identified a critical property, null-(transaction) invariance, for measuring event associations in large data sets, but many existing measures do not have this property. We thus re-examine the null-invariant measures and find interestingly that they can be expressed as a generalized mathematical mean, and there exists a total ordering of them. This ordering provides insights into the underlying philosophy of the measures and helps us understand and select the proper measure for different applications. 1
Association Rules Mining: A Recent Overview
"... Abstract. In this paper, we provide the preliminaries of basic concepts about association rule mining and survey the list of existing association rule mining techniques. Of course, a single article cannot be a complete review of all the algorithms, yet we hope that the references cited will cover th ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. In this paper, we provide the preliminaries of basic concepts about association rule mining and survey the list of existing association rule mining techniques. Of course, a single article cannot be a complete review of all the algorithms, yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions that have yet to be explored. 1
Data Mining: Concepts and
, 2001
"... Rule: Basic Concepts n Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) n Find: all rules that correlate the presence of one set of items with that of another set of items n E.g., 98% of people who purchase tires and auto accessorie ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Rule: Basic Concepts n Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) n Find: all rules that correlate the presence of one set of items with that of another set of items n E.g., 98% of people who purchase tires and auto accessories also get automotive services done n Applications n * Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) n Home Electronics * (What other products should the store stocks up?) n Attached mailing in direct marketing n Detecting ping-pong ing of patients, faulty collisions January 17, 2001 Data Mining: Concepts and Techniques 5 Rule Measures: Support and Confidence n Find all the rules X & Y Z with minimum confide
Efficient Mining of Association Rules
- in Text Databases” CIKM'99
, 1999
"... • Association rule mining • Mining single-dimensional Boolean association rules from transactional databases • Mining multilevel association rules from transactional databases • Mining multidimensional association rules from transactional databases and data warehouse • From association mining to cor ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
• Association rule mining • Mining single-dimensional Boolean association rules from transactional databases • Mining multilevel association rules from transactional databases • Mining multidimensional association rules from transactional databases and data warehouse • From association mining to correlation analysis • Constraint-based association mining
BLOSOM: A Framework for Mining Arbitrary Boolean Expressions over Attribute Sets
- IN: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD 2006
, 2006
"... We introduce a novel framework (BLOSOM) for mining (frequent) boolean expressions over binary-valued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We introduce a novel framework (BLOSOM) for mining (frequent) boolean expressions over binary-valued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, we propose a closure operator that naturally leads to the concept of a closed boolean expression. The closed expressions and their minimal generators give the most specific and most general boolean expressions that are satisfied by their corresponding object set. Further, the closed/minimal generator expressions form a lossless representation of all possible boolean expressions. BLOSSOM efficiently
Privacy-maxent: integrating background knowledge in privacy quantification
- in SIGMOD, 2008
"... Privacy-Preserving Data Publishing (PPDP) deals with the publication of microdata while preserving people ’ private information in the data. To measure how much private information can be preserved, privacy metrics is needed. An essential element for privacy metrics is the measure of how much advers ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Privacy-Preserving Data Publishing (PPDP) deals with the publication of microdata while preserving people ’ private information in the data. To measure how much private information can be preserved, privacy metrics is needed. An essential element for privacy metrics is the measure of how much adversaries can know about an individual’s sensitive attributes (SA) if they know the individual’s quasi-identifiers (QI), i.e., we need to measure P(SA | QI). Such a measure is hard to derive when adversaries ’ background knowledge has to be considered. We propose a systematic approach, Privacy-MaxEnt, to integrate background knowledge in privacy quantification. Our approach is based on the maximum entropy principle. We treat all the conditional probabilities P(SA | QI) as unknown variables; we treat the background knowledge as the constraints of these variables; in addition, we also formulate constraints from the published data. Our goal becomes finding a solution to those variables (the probabilities) that satisfy all these constraints. Although many solutions may exist, the most unbiased estimate of P(SA | QI) is the one that achieves the maximum entropy.
Mining for Contiguous Frequent Itemsets in Transaction Databases
- In Proceedings of the IEEE 3rd International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IEEE
, 2005
"... Abstract: Mining a transaction database for association rules is a particularly popular data mining task, which involves the search for frequent co-occurrences among items. One of the problems often encountered is the large number of weak rules extracted. Item taxonomies, when available, can be used ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract: Mining a transaction database for association rules is a particularly popular data mining task, which involves the search for frequent co-occurrences among items. One of the problems often encountered is the large number of weak rules extracted. Item taxonomies, when available, can be used to reduce them to a more usable volume. In this paper we introduce a new data mining paradigm, which involves the discovery of contiguous frequent itemsets. We formulate the problem of mining contiguous frequent itemsets in a transaction database and we present a level-wise algorithm for finding these itemsets. Contiguous frequent itemsets may contain important knowledge about the dataset, that can not be exposed by the use of classic association rule mining approaches. This knowledge may well include serious hints for the generation of a taxonomy for all or part of the items.

