Results 1 - 10
of
50
A survey of evolutionary algorithms for data mining and knowledge discovery
- In: A. Ghosh, and S. Tsutsui (Eds.) Advances in Evolutionary Computation
, 2002
"... Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowled ..."
Abstract
-
Cited by 73 (3 self)
- Add to MetaCart
Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting high-level knowledge from data. 1.
Maximizing text-mining performance
- IEEE Intelligent Systems
, 1999
"... data warehouses, where data might be stored as electronic documents or as text fields in databases, text mining has increased in importance and economic value. One important goal in text mining is automatic classification of electronic documents. Computer programs scan text in a document and apply a ..."
Abstract
-
Cited by 63 (8 self)
- Add to MetaCart
data warehouses, where data might be stored as electronic documents or as text fields in databases, text mining has increased in importance and economic value. One important goal in text mining is automatic classification of electronic documents. Computer programs scan text in a document and apply a model that assigns the document to one or more prespecified topics. Researchers have used benchmark data, such as the Reuters-21578 test collection, to measure advances in automated text categorization. Conventional methods such as decision trees have had competitive, but not optimal, predictive performance. Using the Reuters collection, we show that adaptive resampling techniques can improve decision-tree performance and that relatively small, pooled local dictionaries are effective. We’ve applied these techniques to online banking applications to enhance automated e-mail routing. Text categorization Many automated prediction methods exist for extracting patterns from sample cases. 1 These patterns can be used to classify new cases. In text mining, specifically text cate-
Evaluation of Machine Learning Methods for Natural Language Processing Tasks
- In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002). Las
, 2002
"... We show that the methodology currently in use for comparing symbolic supervised learning methods applied to human language technology tasks is unreliable. We show that the interaction between algorithm parameter settings and feature selection within a single algorithm often accounts for a higher var ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
We show that the methodology currently in use for comparing symbolic supervised learning methods applied to human language technology tasks is unreliable. We show that the interaction between algorithm parameter settings and feature selection within a single algorithm often accounts for a higher variation in results than differences between different algorithms or information sources. We illustrate this with experiments on a number of linguistic datasets. The consequences of this phenomenon are far-reaching, and we discuss possible solutions to this methodological problem.
Time series data mining: Identifying temporal patterns for characterization and prediction of time series events
- Marquette University
, 1999
"... This work is dedicated to my wife, Christine, our son, Christopher, and his brother, who will arrive shortly. Acknowledgment I would like to thank Dr. Xin Feng for the encouragement, support, and direction he has provided during the past three years. His insightful suggestions, enthusiastic endorsem ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
This work is dedicated to my wife, Christine, our son, Christopher, and his brother, who will arrive shortly. Acknowledgment I would like to thank Dr. Xin Feng for the encouragement, support, and direction he has provided during the past three years. His insightful suggestions, enthusiastic endorsement, and shrewd proverbs have made the completion of this research possible. They provide an example to emulate. I owe a debt of gratitude to my committee members, Drs. Naveen Bansal, Ronald Brown, George Corliss, and James Heinen, who each have helped me to expand the breadth of my research by providing me insights into their areas of expertise. I am grateful to Marquette University for its financial support of this research, and the faculty of the Electrical and Computer Engineering Department for providing a rigorous and stimulating environment that exemplifies cura personalis. I thank Mark Palmer for many interesting, insightful, and thought provoking
Identifying Temporal Patterns for Characterization and Prediction of Financial Time Series Events
, 2000
"... . The novel Time Series Data Mining (TSDM) framework is applied to analyzing financial time series. The TSDM framework adapts and innovates data mining concepts to analyzing time series data. In particular, it creates a set of methods that reveal hidden temporal patterns that are characteristi ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
. The novel Time Series Data Mining (TSDM) framework is applied to analyzing financial time series. The TSDM framework adapts and innovates data mining concepts to analyzing time series data. In particular, it creates a set of methods that reveal hidden temporal patterns that are characteristic and predictive of time series events. This contrasts with other time series analysis techniques, which typically characterize and predict all observations. The TSDM framework and concepts are reviewed, and the applicable TSDM method is discussed. Finally, the TSDM method is applied to time series generated by a basket of financial securities. The results show that statistically significant temporal patterns that are both characteristic and predictive of events in financial time series can be identified. 1 Introduction The Time Series Data Mining (TSDM) framework [1-4] is applied to the prediction of financial time series. TSDM-based methods can successfully characterize and p...
A Classification-based Methodology for Planning Audit Strategies in Fraud Detection
- Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 1999
"... Planning adequate audit strategies is a key success factor in a posterion ’ fraud detection, e.g., in the fiscal and insurance domains, where audits are intended to detect tax evasion and fraudulent claims. A case study is presented in this paper, which illustrates how techniques based on classifica ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Planning adequate audit strategies is a key success factor in a posterion ’ fraud detection, e.g., in the fiscal and insurance domains, where audits are intended to detect tax evasion and fraudulent claims. A case study is presented in this paper, which illustrates how techniques based on classification can be used to support the task of planning audit strategies. The proposed approach is sensible to some conflicting issues of audit planning, e.g., the trade-off between maximizing audit benefits vs. minimizing audit costs. A methodological scenario, common to a whole class of similar applications, is then abstracted away from the case study. The limitations of available systems to support the identified overall KDD process, bring us to point out the key aspects of a logic-based database language, integrated with mining mechanisms, which is used to provide a uniform, highly expressive environment for the various steps in the construction of the considered case-study. Keywords Knowledge discovery in databases, data mining, classification, decision trees, fraud detection, logic-based database languages, integration of querying and mining. 1.
Lightweight Rule Induction
, 2000
"... A lightweight rule induction method is described that generates compact Disjunctive Normal Form (DNF) rules. Each class has an equal numberofunweighted rules. A new example is classified by applying all rules and assigning the example to the class with the most satisfied rules. The induction m ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
A lightweight rule induction method is described that generates compact Disjunctive Normal Form (DNF) rules. Each class has an equal numberofunweighted rules. A new example is classified by applying all rules and assigning the example to the class with the most satisfied rules. The induction method attempts to minimize the training error with no pruning. An overall design is specified by setting limits on the size and number of rules. During training, cases are adaptively weighted using a simple cumulativeerror method. The induction method is nearly linear in time relative to an increase in the number of induced rules or the number of cases. Experimental results on large benchmark data sets demonstrate that predictive performance can rival the best reported results in the literature.
Classification with Degree of Membership: A Fuzzy Approach
, 2001
"... research. It is concerned with the prediction of the values of some attribute in a database based on other attributes. To tackle this problem, most of the existing data mining algorithms adopt either a decision tree based approach or an approach that requires users to provide some userspecified thre ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
research. It is concerned with the prediction of the values of some attribute in a database based on other attributes. To tackle this problem, most of the existing data mining algorithms adopt either a decision tree based approach or an approach that requires users to provide some userspecified thresholds to guide the search for interesting rules. In this paper, we propose a new approach based on the use of an objective interestingness measure to distinguish interesting rules from uninteresting ones. Using linguistic terms to represent the revealed regularities and exceptions, this approach is especially useful when the discovered rules are presented to human experts for examination because of the affinity with the human knowledge representation. The use of fuzzy technique allows the prediction of attribute values to be associated with degree of membership. Our approach is, therefore, able to deal with the cases that an object can belong to more than one class. For example, a person can suffer from cold and fever to certain extent at the same time. Furthermore, our approach is more resilient to noise and missing data values because of the use of fuzzy technique. To evaluate the performance of our approach, we tested it using several real-life databases. The experimental results show that it can be very effective at data mining tasks. In fact, when compared to popular data mining algorithms, our approach can be better able to uncover useful rules hidden in databases.
Real Option Models For Managing Manufacturing System Changes In The Neweconomy
- System Changes in the
, 2000
"... The manufacturing environment is becoming increasingly dynamic with upsurges in electronic-commerce, supply chain management, forecasting, and procurement and resource planning. It also includes trends toward more process data acquisition and analysis, shorter production runs, and more stringent qua ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
The manufacturing environment is becoming increasingly dynamic with upsurges in electronic-commerce, supply chain management, forecasting, and procurement and resource planning. It also includes trends toward more process data acquisition and analysis, shorter production runs, and more stringent quality requirements. These drivers lead to an opportunity for companies to collect and use information to identify changes that will affect their manufacturing systems. In conjunction with an industry partner who produces home fashion products, we developed a case-study that highlights four major manufacturing transitions: new product introduction; moving a product from research and development (R&D) to commercialization; new plant location; and starting or restarting production of existing products. These types of changes cross many levels of the operation - including the product level, plant level, and organizational level - and typically present significant operational challenges. We use this case-study to motivate the theoretical and applied research needed to support a real option framework for system changes in manufacturing. The key elements of our framework are to quantify manufacturing changes, develop a real option model for these activities, value the options to identify the best scenarios, and integrate these elements so that we can monitor and manage the overall process. The advantage of this approach is that it allows us to directly incorporate a market driven perspective, tying the manufacturing operations with the organizational economic goals.
Utility based Data Mining for Time Series Analysis - Cost Sensitive Learning for
- Neural Network Predictors,” The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2005
"... In corporate data mining applications, cost-sensitive learning is firmly established for predictive classification algorithms. Conversely, data mining methods for regression and time series analysis generally disregard economic utility and apply simple accuracy measures. Methods from statistics and ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In corporate data mining applications, cost-sensitive learning is firmly established for predictive classification algorithms. Conversely, data mining methods for regression and time series analysis generally disregard economic utility and apply simple accuracy measures. Methods from statistics and computational intelligence alike minimise a symmetric statistical error, such as the sum of squared errors, to model ordinary least squares predictors. However, applications in business elucidate that real forecasting problems contain non-symmetric errors. The costs arising from over- versus underprediction are dissimilar for errors of identical magnitude, requiring an ex-post correction of the prediction to derive valid decisions. To reflect this, an asymmetric cost function is developed and employed as the objective function for neural network training, deriving superior forecasts and a cost efficient decision. Experimental results for a business scenario of inventory-levels are computed using a multilayer perceptron trained with different objective functions, evaluating the performance in competition to statistical forecasting methods.

