• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

General and efficient multisplitting of numerical attributes (1999)

by T Elomaa, J Rousu
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 21
Next 10 →

Multivariate Discretization for Set Mining

by Stephen D. Bay - KNOWLEDGE AND INFORMATION SYSTEMS , 2000
"... Many algorithms in data mining can be formulated as a set mining problem where the goal is to find conjunctions (or disjunctions) of terms that meet user specified constraints. Set mining techniques have been largely designed for categorical or discrete data where variables can only take on a fixed ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
Many algorithms in data mining can be formulated as a set mining problem where the goal is to find conjunctions (or disjunctions) of terms that meet user specified constraints. Set mining techniques have been largely designed for categorical or discrete data where variables can only take on a fixed number of values. However, many data sets also contain continuous variables and a common method of dealing with these is to discretize them by breaking them into ranges. Most discretization methods are univariate and consider only a single feature at a time (sometimes in conjunction with a class variable). We argue that this is a sub-optimal approach for knowledge discovery as univariate discretization can destroy hidden patterns in data. Discretization should consider the effects on all variables in the analysis and that two regions X and Y should only be in the same interval after discretization if the instances in those regions have similar multivariate distributions (Fx Fy) across all variables and combinations of variables. We present a bottom up merging algorithm to discretize continuous variables based on this rule. Our experiments indicate that the approach is feasible, that it will not destroy hidden patterns and that it will generate meaningful intervals.

Adaptive fastest path computation on a road network: A traffic mining approach

by Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag - In Proc. 2007 Int. Conf. on Very Large Data Bases (VLDB’07 , 2007
"... Efficient fastest path computation in the presence of varying speed conditions on a large scale road network is an essential problem in modern navigation systems. Factors affecting road speed, such as weather, time of day, and vehicle type, need to be considered in order to select fast routes that m ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
Efficient fastest path computation in the presence of varying speed conditions on a large scale road network is an essential problem in modern navigation systems. Factors affecting road speed, such as weather, time of day, and vehicle type, need to be considered in order to select fast routes that match current driving conditions. Most existing systems compute fastest paths based on road Euclidean distance and a small set of predefined road speeds. However, “History is often the best teacher”. Historical traffic data or driving patterns are often more useful than the simple Euclidean distance-based computation because people must have good reasons to choose these routes, e.g., they may want to avoid those that pass through high crime areas at night or that likely encounter accidents, road construction, or traffic jams. In this paper, we present an adaptive fastest path algorithm capable of efficiently accounting for important driving and speed patterns mined from a large set of traffic data. The algorithm is based on the following observations: (1) The hierarchy of roads can be used to partition the road network into areas, and different path pre-computation strategies can be used at the area level, (2) we can limit our route search strategy to edges and path segments that are actually frequently traveled in the data, and (3) drivers usually traverse the road network through the largest roads available given the distance of the trip, except if there are small roads with a significant speed advantage over the large ones. Through an extensive experimental evaluation on real road networks we show that our algorithm provides desirable (short and well-supported) routes, and that it is significantly faster than competing methods.

Analysis of interpretability-accuracy tradeoff of fuzzy systems by multiobjective fuzzy genetics-based machine learning

by Hisao Ishibuchi, Yusuke Nojima - International Journal of Approximate Reasoning , 2007
"... This paper examines the interpretability-accuracy tradeoff in fuzzy rule-based classifiers using a multiobjective fuzzy genetics-based machine learning (GBML) algorithm. Our GBML algorithm is a hybrid version of Michigan and Pittsburgh approaches, which is implemented in the framework of evolutionar ..."
Abstract - Cited by 11 (4 self) - Add to MetaCart
This paper examines the interpretability-accuracy tradeoff in fuzzy rule-based classifiers using a multiobjective fuzzy genetics-based machine learning (GBML) algorithm. Our GBML algorithm is a hybrid version of Michigan and Pittsburgh approaches, which is implemented in the framework of evolutionary multiobjective optimization (EMO). Each fuzzy rule is represented by its antecedent fuzzy sets as an integer string of fixed length. Each fuzzy rule-based classifier, which is a set of fuzzy rules, is represented as a concatenated integer string of variable length. Our GBML algorithm simultaneously maximizes the accuracy of rule sets and minimizes their complexity. The accuracy is measured by the number of correctly classified training patterns while the complexity is measured by the number of fuzzy rules and/or the total number of antecedent conditions of fuzzy rules. We examine the interpretability-accuracy tradeoff for training patterns through computational experiments on some benchmark data sets. A clear tradeoff structure is visualized for each data set. We also examine the interpretability-accuracy tradeoff for test patterns. Due to the overfitting to training patterns, a clear tradeoff structure is not always obtained in computational experiments for test patterns.

Effects of Three-Objective Genetic Rule Selection on the Generalization Ability of Fuzzy Rule-based Systems

by Hisao Ishibuchi, Takashi Yamamoto - Lecture Notes in Computer Science 2632: Evolutionary Multi-Criterion Optimization , 2003
"... One advantage of evolutionary multiobjective optimization (EMO) algorithms over classical approaches is that many non-dominated solutions can be simultaneously obtained by their single run. This paper shows how this advantage can be utilized in genetic rule selection for the design of fuzzy ruleb ..."
Abstract - Cited by 6 (5 self) - Add to MetaCart
One advantage of evolutionary multiobjective optimization (EMO) algorithms over classical approaches is that many non-dominated solutions can be simultaneously obtained by their single run. This paper shows how this advantage can be utilized in genetic rule selection for the design of fuzzy rulebased classification systems. Our genetic rule selection is a two-stage approach.

Generalizing Boundary Points

by Tapio Elomaa, Juho Rousu - Proceedings of the Seventeenth National Conference on Articial Intelligence (pp. 570576). Menlo Park , 2000
"... The complexity of numerical domain partitioning depends on the number of potential cut points. In multiway partitioning this dependency is often quadratic, even exponential. Therefore, reducing the number of candidate cut points is important. For a large family of attribute evaluation functions only ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
The complexity of numerical domain partitioning depends on the number of potential cut points. In multiway partitioning this dependency is often quadratic, even exponential. Therefore, reducing the number of candidate cut points is important. For a large family of attribute evaluation functions only boundary points need to be considered as candidates. We prove that an even more general property holds for many commonly-used functions. Their optima are located on the borders of example segments in which the relative class frequency distribution is static. These borders are a subset of boundary points. Thus, even less cut points need to be examined for these functions. The results shed a new light on the splitting properties of common attribute evaluation functions and they have practical value as well. The functions that are examined also include non-convex ones. Hence, the property introduced is not just another consequence of the convexity of a function.

Hybridization of fuzzy GBML approaches for pattern classification problems

by Hisao Ishibuchi, Hisao Ishibuchi, Takashi Yamamoto, Takashi Yamamoto, Tomoharu Nakashima, Tomoharu Nakashima - IEEE Trans. on Systems, Man, and Cybernetics - Part B , 2005
"... Abstract- We propose a hybrid algorithm of two fuzzy genetics-based machine learning approaches (i.e., Michigan and Pittsburgh) for designing fuzzy rule-based classification systems. First, we examine the search ability of each approach to efficiently find fuzzy rule-based systems with high classifi ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
Abstract- We propose a hybrid algorithm of two fuzzy genetics-based machine learning approaches (i.e., Michigan and Pittsburgh) for designing fuzzy rule-based classification systems. First, we examine the search ability of each approach to efficiently find fuzzy rule-based systems with high classification accuracy. It is clearly demonstrated that each approach has its own advantages and disadvantages. Next, we combine these two approaches into a single hybrid algorithm. Our hybrid algorithm is based on the Pittsburgh approach where a set of fuzzy rules is handled as an individual. Genetic operations for generating new fuzzy rules in the Michigan approach are utilized as a kind of heuristic mutation for partially modifying each rule set. Then, we compare our hybrid algorithm with the Michigan and Pittsburgh approaches. Experimental results show that our hybrid algorithm has higher search ability. The necessity of a heuristic specification method of antecedent fuzzy sets is also demonstrated by computational experiments on high-dimensional problems. Finally we examine the generalization ability of fuzzy rule-based classification systems designed by our hybrid algorithm. Index Terms- Pattern classification, fuzzy rules, genetic algorithms, machine learning. I.

On the Well-Behavedness of Important Attribute Evaluation Functions

by Tapio Elomaa, Juho Rousu - In G. Grahne (Ed.), Proceedings of the Sixth Scandinavian Conference on Artificial Intelligence (pp. 95--106). Frontiers in Artificial Intelligence and Applications (Vol , 1997
"... The class of well-behaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only the function ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
The class of well-behaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only the function is cumulative in addition to being well-behaved. The class of well-behaved evaluation functions is a proper superclass of convex evaluation functions. Thus, a large proportion of the most important attribute evaluation functions are well-behaved. This paper explores the extent and boundaries of well-behaved functions. In particular, we examine C4.5's default attribute evaluation function gain ratio, which has been known to have problems with numerical attributes. We show that gain ratio is not convex, but is still well-behaved with respect to binary partitioning. However, it cannot handle higher arity partitioning well. Our empirical experiments show that a very simple cumulative rectifi...

A Bayesian Approach to Discretization

by Petri Kontkanen, Petri Myllymäki, Tomi Silander, Henry Tirri - Proceedings of the European Symposium on Intelligent Techniques , 1997
"... : The performance of many machine learning algorithms can be substantially improved with a proper discretization scheme. In this paper we describe a theoretically rigorous approach to discretization of continuous attribute values, based on a Bayesian clustering framework. The method produces a proba ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
: The performance of many machine learning algorithms can be substantially improved with a proper discretization scheme. In this paper we describe a theoretically rigorous approach to discretization of continuous attribute values, based on a Bayesian clustering framework. The method produces a probabilistic scoring metric for different discretizations, and it can be combined with various types of learning algorithms working on discrete data. The approach is validated by demonstrating empirically the performance improvement of the Naive Bayes classifier when Bayesian discretization is used instead of the standard equal frequency interval discretization. 1 INTRODUCTION Many algorithms developed in the machine learning and uncertain reasoning community focus on learning in nominal feature bases. On the other hand, many real world tasks involve continuous attribute domains. Consequently, in order to be able to use such algorithms, a discretization process is needed. Continuous variable d...

Evolutionary multiobjective optimization for generating an ensemble of fuzzy rule-based classifiers

by Hisao Ishibuchi, Takashi Yamamoto - In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2003), Lecture Notes in Computer Science (LNCS , 2003
"... Abstract. One advantage of evolutionary multiobjective optimization (EMO) algorithms over classical approaches is that many non-dominated solutions can be simultaneously obtained by their single run. In this paper, we propose an idea of using EMO algorithms for constructing an ensemble of fuzzy rule ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract. One advantage of evolutionary multiobjective optimization (EMO) algorithms over classical approaches is that many non-dominated solutions can be simultaneously obtained by their single run. In this paper, we propose an idea of using EMO algorithms for constructing an ensemble of fuzzy rule-based classifiers with high diversity. The classification of new patterns is performed based on the vote of multiple classifiers generated by a single run of EMO algorithms. Even when the classification performance of individual classifiers is not high, their ensemble often works well. The point is to generate multiple classifiers with high diversity. We demonstrate the ability of EMO algorithms to generate various non-dominated fuzzy rule-based classifiers with high diversity by their single run. Through computational experiments on some wellknown benchmark data sets, it is shown that the vote of generated fuzzy rulebased classifiers leads to high classification performance on test patterns. 1

Comparison of Heuristic Criteria for Fuzzy Rule Selection in Classification Problems

by Hisao Ishibuchi, Takashi Yamamoto, Prof Hisao Ishibuchi
"... Abstract. This paper compares heuristic criteria used for extracting a pre-specified number of fuzzy classification rules from numerical data. We examine the performance of each heuristic criterion through computational experiments on well-known test problems. Experimental results show that better r ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract. This paper compares heuristic criteria used for extracting a pre-specified number of fuzzy classification rules from numerical data. We examine the performance of each heuristic criterion through computational experiments on well-known test problems. Experimental results show that better results are obtained from composite criteria of confidence and support measures than their individual use. It is also shown that genetic algorithm-based rule selection can improve the classification ability of extracted fuzzy rules by searching for good rule combinations. This observation suggests the importance of taking into account the combinatorial effect of fuzzy rules (i.e., the interaction among them).
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University