Results 1  10
of
60
Rule discovery from time series
 In Proceedings of the 1997 ACM SIGKDD International Conference, ACM SIGKDD
, 1997
"... We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such as "a period of low telephone call activity is usually followed by a sharp rise ill call vohune". Exa ..."
Abstract

Cited by 142 (0 self)
 Add to MetaCart
We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such as "a period of low telephone call activity is usually followed by a sharp rise ill call vohune". Examples of rules relating two or more time series are "if the Microsoft stock price goes up and lntel falls, then IBM goes up the next. day, " and "if Microsoft goes up strongly fro " one day, then declines strongly on the next day, and on the same days Intel stays about, level, then IBM stays about level. " Our emphasis is in the discovery of local patterns in multivariate time series, in contrast to traditional time series analysis which largely focuses on global models. Thus, we search for rules whose conditions refer to patterns in time series. However, we do not want to define beforehand which patterns are to be used; rather, we want the patterns to be formed fl’om the data in the context of rule discovery. We describe adaptive methods for finding rules of the above type fi’om timeseries data. The methods are based on discretizing the sequence hy methods resembling vector quantization. \,Ve first form subsequences by sliding window through the time series, and then cluster these subsequences by using a suitable measure of timeseries similarity. The discretized version of the time series is obtained by taldng the cluster identifiers corresponding to the subsequence. Once tl,e timeseries is discretized, we use simple rule finding methods to obtain rifles from the sequence. "vVe present empMcal resuh.s on the behavior of the method.
Unexpectedness as a Measure of Interestingness in Knowledge Discovery
 In Proceedings of the First International Conference on Knowledge Discovery and Data Mining
, 1999
"... Organizations are taking advantage of "datamining" techniques to leverage the vast amounts of data captured as they process routine transactions. Datamining is the process of discovering hidden structure or patterns in data. However several of the pattern discovery methods in datamining systems ha ..."
Abstract

Cited by 140 (9 self)
 Add to MetaCart
Organizations are taking advantage of "datamining" techniques to leverage the vast amounts of data captured as they process routine transactions. Datamining is the process of discovering hidden structure or patterns in data. However several of the pattern discovery methods in datamining systems have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that managers have. This research addresses these drawbacks by developing ways to generate interesting patterns by incorporating managers' prior knowledge in the process of searching for patterns in data. Specifically we focus on providing methods that generate unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to seed the search for unexpected patterns in data. Our approach should lead to the development of decision support systems that provide managers with mor...
Error Reduction through Learning Multiple Descriptions
, 1996
"... . Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount ..."
Abstract

Cited by 126 (3 self)
 Add to MetaCart
. Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount of error reduction is linked to the "degree to which the descriptions for a class make errors in a correlated manner." We present a precise and novel definition for this notion and use twentynine data sets to show that the amount of observed error reduction is negatively correlated with the degree to which the descriptions make errors in a correlated manner. We empirically show that it is possible to learn descriptions that make less correlated errors in domains in which many ties in the search evaluation measure (e.g. information gain) are experienced during learning. The paper also presents results that help to understand when and why multiple descriptions are a help (irrelevant attribute...
On biases in estimating multivalued attributes
 In Proceedings of the 14th International Joint Conference on Artificial Intelligence
, 1995
"... We analyse the biases of eleven measures for estimating the quality of the multivalued attributes. The values of information gain, Jmeasure, giniindex, and relevance tend to linearly increase with the number of values of an attribute. The values of gainratio, distance measure, Relief, and the we ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
We analyse the biases of eleven measures for estimating the quality of the multivalued attributes. The values of information gain, Jmeasure, giniindex, and relevance tend to linearly increase with the number of values of an attribute. The values of gainratio, distance measure, Relief, and the weight of evidence decrease for informative attributes and increase for irrelevant attributes. The bias of the statistic tests based on the chisquare distribution is similar but these functions are not able to discriminate among the attributes of different quality. We also introduce a new function based on the MDL principle whose value slightly decreases with the increasing number of attribute’s values. 1
Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison
, 1995
"... The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Asso ..."
Abstract

Cited by 69 (0 self)
 Add to MetaCart
The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing efforts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TIDlist data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one datab...
The kserver problem
 Computer Science Review
"... The kserver problem is perhaps the most influential online problem: natural, crisp, with a surprising technical depth that manifests the richness of competitive analysis. The kserver conjecture, which was posed more that two decades ago when the problem was first studied within the competitive ana ..."
Abstract

Cited by 66 (5 self)
 Add to MetaCart
The kserver problem is perhaps the most influential online problem: natural, crisp, with a surprising technical depth that manifests the richness of competitive analysis. The kserver conjecture, which was posed more that two decades ago when the problem was first studied within the competitive analysis framework, is still open and has been a major driving force for the development of the area online algorithms. This article surveys some major results for the kserver. 1
Inductive and Bayesian learning in medical diagnosis
 Applied Artificial Intelligence
, 1993
"... Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and t ..."
Abstract

Cited by 65 (11 self)
 Add to MetaCart
Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classi er. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared and the interpretation of the knowledge and the explanation ability of the classi cation process of each system is discussed. Surprisingly, thenaiveBayesian classi er is superior to Assistant in classi cation accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In addition, two extensions to naive Bayesian classi er are brie y described: dealing with continuous attributes, and discovering the dependencies among attributes.
HYDRA: A Noisetolerant Relational Concept Learning Algorithm
 In Proceedings of the 8th International Workshop on Machine Learning
, 1993
"... Many learning algorithms form concept descriptions composed of clauses, each of which covers some proportion of the positive training data and a small to zero proportion of the negative training data. This paper presents a method using likelihood ratios attached to clauses to classify test exam ..."
Abstract

Cited by 62 (5 self)
 Add to MetaCart
Many learning algorithms form concept descriptions composed of clauses, each of which covers some proportion of the positive training data and a small to zero proportion of the negative training data. This paper presents a method using likelihood ratios attached to clauses to classify test examples. One concept description is learned for each class. Each concept description competes to classify the test example using the likelihood ratios assigned to clauses of that concept description. By testing on several artificial and "real world" domains, we demonstrate that attaching weights and allowing concept descriptions to compete to classify examples reduces an algorithm's susceptibility to noise.
Selecting the right objective measure for association analysis
 Information Systems
"... Abstract. Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data min ..."
Abstract

Cited by 61 (6 self)
 Add to MetaCart
Abstract. Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the right measure for a given application. A comparative study of these properties is made using twentyone measures that were originally developed in diverse fields such as statistics, social science, machine learning, and data mining. We show that depending on its properties, each measure is useful for some application, but not for others. We also demonstrate two scenarios in which many existing measures become consistent with each other, namely, when supportbased pruning and a technique known as table standardization are applied. Finally, we present an algorithm for selecting a small set of patterns such that domain experts can find a measure that best fits their requirements by ranking this small set of patterns. 1
Mining for Strong Negative Associations in a Large Database of Customer Transactions
, 1998
"... Mining for association rules is considered an important data mining problem. Many different variations of this problem have been described in the literature. In this paper we introduce the problem of mining for negative associations. A naive approach to finding negative associations leads to a very ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Mining for association rules is considered an important data mining problem. Many different variations of this problem have been described in the literature. In this paper we introduce the problem of mining for negative associations. A naive approach to finding negative associations leads to a very large number of rules with low interest measures. We address this problem by combining previously discovered positive associations with domain knowledge to constrain the search space such that fewer but more interesting negative rules are mined. We describe an algorithm that efficiently finds all such negative associations and present the experimental results. 1 Introduction Wide spread use of computers in business operations and the availability of cheap storage devices have led to an explosive growth in the amount of data gathered and stored by most business organizations today. There has been a trend in recent years to search for interesting patterns in the data and use them for improve...