## Algorithm Selection for Sorting and Probabilistic Inference: A Machine Learning-Based Approach (2003)

Venue: | KANSAS STATE UNIVERSITY |

Citations: | 8 - 0 self |

### BibTeX

@TECHREPORT{Guo03algorithmselection,

author = {Haipeng Guo},

title = {Algorithm Selection for Sorting and Probabilistic Inference: A Machine Learning-Based Approach},

institution = {KANSAS STATE UNIVERSITY},

year = {2003}

}

### OpenURL

### Abstract

The algorithm selection problem aims at selecting the best algorithm for a given computational problem instance according to some characteristics of the instance. In this dissertation, we first introduce some results from theoretical investigation of the algorithm selection problem. We show, by Rice's theorem, the nonexistence of an automatic algorithm selection program based only on the description of the input instance and the competing algorithms. We also describe an abstract theoretical framework of instance hardness and algorithm performance based on Kolmogorov complexity to show that algorithm selection for search is also incomputable. Driven by the theoretical results, we propose a machine learning-based inductive approach using experimental algorithmic methods and machine learning techniques to solve the algorithm selection problem. Experimentally, we have

### Citations

11367 |
Computers and Intractability, a Guide to the Theory of NPCompleteness. Freeman
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...explicitly saying so, assuming that any problem instances and algorithms can be encoded into some strings of 0s and 1s. We assume the encoding scheme is “reasonable” in the sense of Garey and John=-=son [GJ79]-=-. Once we have fixed the representation, an algorithm for a decision problem can be found by simply creating a Turing machine that decides the corresponding language. That is, the Turing machine accep... |

9088 | Elements of information theory - Cover, Thomas - 1991 |

8066 |
Genetic Algorithms
- Goldberg
- 1989
(Show Context)
Citation Context ...sults alone cannot tell the full story about algorithm’s performance. Moreover, many of the recently invented algorithms, especially randomized and heuristic algorithms such as Genetic Algorithm (GA=-=) [Gol89]-=-, are too complex for a detailed mathematical analysis to be reasonable. Usually for most such algorithms in practice, some facts are known about their performance but they have really not been fully ... |

7414 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...ly, automating the experimental aspect is more feasible because of the progresses that have been made in experimental algorithmic [Joh02], machine learning [Mit97], and uncertain reasoning techniques =-=[Pea88]-=-. The difficulty of automatic algorithm selection is largely due to the uncertainty in the input problem space, the lack of understanding to the working mechanism of the algorithm space, and the uncer... |

6862 |
The Mathematical Theory of Communication
- Shannon, Weaver
- 1949
(Show Context)
Citation Context ...ons, this definition of information has the advantage that it refers to individual objects, not to objects that are treated as elements of a set of objects with a probability distribution given on it =-=[Sha48]. -=-Definition 13 (Algorithmic Information in x about y) Let x and y be two strings. The algorithmic information in y about x is defined by I(y : x) = K(x) − K(x|y) The value K(x) can be interpreted of ... |

5328 |
5: Programs for Machine Learning
- Quinlan, C4
- 1993
(Show Context)
Citation Context ...ms: ID3 ID3 and its successor, C45, are two of the most important algorithms for learning a decision tree from data. ID3 was developed by Quinlan in 1986 [Qui86]. It was extended to C45 later in 1993 =-=[Qui93]. 90-=-sThe basic algorithm, ID3, learns a decision tree by constructing it top-down. At each node, the following question is asked: “which attribute should be tested here?” The question is answered usin... |

4099 |
Artificial Intelligence: a modern approach
- Russell, Norvig
- 2002
(Show Context)
Citation Context ...irectly. BNs sidestep the JPD and work directly with conditional probabilities in order to reduce the representational and computational complexity. The Syntax of Bayesian Networks A Bayesian network =-=[RN95]-=- is a graph in which the following holds: 1. A set of random variables corresponds to the nodes of the network. 1 For simplicity and without losing generality, we only consider discrete variables. 33s... |

3990 | Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images - Geman, Geman - 1984 |

3857 | Optimization by simulated annealing
- Kirkpatrick, Gelatt, et al.
- 1983
(Show Context)
Citation Context ... use some meta-heuristic to guide the search in order to avoid getting stuck into a local optima. The most popular meta-heuristics include various Hillclimbing algorithms [Hro01], Simulated Annealing =-=[KGV83]-=-, Tabu Search [GLW93], Genetic Algorithms [Gol89], etc. Random-restart Hill Climbing Hill climbing [Hro01] is a greedy local search method. It goes to its best neighbor whenever possible. It is east t... |

3591 | Induction of Decision Trees - QUINLAN - 1986 |

3304 |
Data mining: Practical machine learning tools and techniques
- Witten, Frank
- 2005
(Show Context)
Citation Context ...a on problem characteristics and algorithm performance, and to induce a predictive algorithm selection model from the training data. This is a typical task in machine learning [Mit97] and data mining =-=[WF99]-=-. The goal of machine learning is to build computer programs that improve automatically with experience. Dating mining is about automatically analyzing data and discovering valuable implicit patterns ... |

2985 |
Adpatation in Natural and Artificial Systems
- Holland
- 1975
(Show Context)
Citation Context ...d into a broad range of applications in search, optimization, and machine learning. However, GAs do fail at times. GAs are complex systems and are hard to design and analyze. Ever since its invention =-=[Hol75]-=-, researchers have put a lot of effort into understanding how GAs work and what makes a function or problem hard for GAs to optimize. There are still quite a few open problems despite more than 30 yea... |

2695 | Bagging Predictors
- Breiman
- 1996
(Show Context)
Citation Context ...ve Bayes, and Bayesian networks. We will also look at three meta-learning schemes: bagging, boosting, and stacking. In Bayesian network learning we need to first discretize the input data. In bagging =-=[Bre96], -=-we use C4.5 as the base classifier to bag. The bag size was the same as Dsort4B. The number of bagging iterations was set to 10. The boosting method used was Freund & Schapire’s Adaboost M1 [FS96] m... |

2408 | Computational Complexity
- Papadimitriou
- 1994
(Show Context)
Citation Context ... R is called a reduction from L1 to L2. To make it more meaningful, we usually require R to be computable by a deterministic Turing machine in space Ω(log n) and time Ω(p(n)) where p(n) is polynom=-=ial [Pap94]. Red-=-uction is transitive. If R is a reduction from language L1 to L2 and R ′ is a reduction from L2 to L3, then the composition R · R ′ is a reduction from L1 to L3. This fact orders problems with re... |

1915 |
The Fractal Geometry of Nature
- Mandelbrot
- 1983
(Show Context)
Citation Context ...egular coastline may be greater than 1 but less than 2, indicating that it is not simply a “line” but 63sFigure 4.1: The Cantor Set has some space-filling characteristics in the plane. The Cantor =-=set [Man82]-=- is the oldest and simplest man-made 1 fractal. As shown in Figure 4.1, it is constructed by removing the middle third from the unit interval and the remaining two subintervals have their middle third... |

1772 | An Introduction to Kolmogorov Complexity and Its Applications, 2nd Edition
- Li, VitNanyi
- 1997
(Show Context)
Citation Context ...It guarantees that some properly defined concepts using UTM have some sort of invariant property [LV93]. Kolmogorov Complexity Kolmogorov complexity, also called descriptive or algorithmic complexity =-=[LV90]-=-, was developed by Solomonof [Sol64], Kolmogorov [Kol65], and Chaitin [Cha66]. The basic idea is to measure the complexity of a string by the size in bits of the smallest program that can produce it. ... |

1741 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ... [Bre96], we use C4.5 as the base classifier to bag. The bag size was the same as Dsort4B. The number of bagging iterations was set to 10. The boosting method used was Freund & Schapire’s Adaboost M=-=1 [FS96]-=- method. The basis classifier used was C4.5. The maximum number of boost iterations was set to 10. In stacking [Wol92], the three base classifiers were C4.5, the naive Bayes classifier and Bayesian ne... |

1380 | A coefficient of agreement for nominal scales - Cohen - 1960 |

1334 | Local computations with probabilities on graphical structures and their applications to expert systems - Lauritzen, Spiegelhalter - 1988 |

1271 |
On computable numbers, with an application to the Entscheidungsproblem
- Turing
- 1936
(Show Context)
Citation Context ...atic checker (a program or a Turing machine) that can read two algorithms and decide which is better, i.e., the problem is undecidable in general. This is easy to understand because Turing has showed =-=[Tur36]-=- that we can not even decide if an algorithm actually halts (the halting problem). Then I turned to the inductive direction without much doubt. At UAI02 at Edmonton, I managed to co-chair a workshop o... |

1127 | A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9:309–347
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...good general algorithms are known for this kind of problem. The Bayesian network learning algorithm we will use is a search-and-scoring based algorithm, namely K2, using a Bayesian score developed by =-=[CH92]-=-. The algorithm searches through the spaces of all possible structures looking for a structure that best fits the data according to some scoring criteria. Since the search space is usually huge, the a... |

1116 | Wrappers for feature subset selection
- Kohavi, John
- 1997
(Show Context)
Citation Context ...s up the learning process, although this maybe outweighed by the computation used for feature selection. There are two important feature selection approaches: the filter method and the wrapper method =-=[KJ97]-=-. The former evaluates the worth of feature subsets based on general characteristics of the data. The latter wraps the machine learning algorithm that will ultimately be used into the feature selectio... |

891 | A tutorial on learning with Bayesian networks - Heckerman - 1998 |

821 | The complexity of theorem-proving procedures
- Cook
- 1971
(Show Context)
Citation Context ...ence and difficulty of a complexity class. NP -complete problems are complete problems of class NP . The SATISFIABILITY problem is the first problem that was proven to be NP -complete by Cook in 1971 =-=[Coo71]-=-. It is specified as follows: SATISFIABILITY (SAT) INSTANCE: A set of boolean variables V and a collection of clauses C over V . QUESTION: Is there a satisfying truth assignment for C? The famous Cook... |

696 | No free lunch theorems for optimization - Wolpert, Macready - 1997 |

693 |
Multi-interval discretization of continuousvalued attributes for classification learning
- Fayyad, Irani
- 1993
(Show Context)
Citation Context ...nformation gain associated with each and the best one is used to discretize A into a binary value. Extensions that spilt numeric attributes into multiple intervals rather than binary are discussed in =-=[FI93]-=-. Missing values Some dataset may contain missing values for certain attributes. One strategy to deal with this issue is to assign it the most common value at the current node. A more complex procedur... |

612 | Markov Chain Monte Carlo in Practice - Gilks, Richardson, et al. - 1996 |

604 | The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks - Cooper - 1990 |

603 | Where the really hard problems are
- Cheeseman, Kanefsky, et al.
- 1991
(Show Context)
Citation Context ...O(n log n). In the community of NP -hard optimization problem-solving, researchers have long noticed that the NP -complete result is just a worst-case result; i.e., not all instances are equally hard =-=[CKT91]-=-. Algorithms that exploit some features of the input instances can perform on the particular class of instances better than the worst-case scenario. In light of this, two of the main directions of thi... |

576 | Stacked generalization
- Wolpert
- 1990
(Show Context)
Citation Context ...ations was set to 10. The boosting method used was Freund & Schapire’s Adaboost M1 [FS96] method. The basis classifier used was C4.5. The maximum number of boost iterations was set to 10. In stackin=-=g [Wol92]-=-, the three base classifiers were C4.5, the naive Bayes classifier and Bayesian network learning. The meta-classifier used was C4.5. We run these 6 learning schemes on Dsort4B and compare the results ... |

553 |
Three approaches to the quantitative definition of information
- Kolmogorov
- 1965
(Show Context)
Citation Context ... UTM have some sort of invariant property [LV93]. Kolmogorov Complexity Kolmogorov complexity, also called descriptive or algorithmic complexity [LV90], was developed by Solomonof [Sol64], Kolmogorov =-=[Kol65]-=-, and Chaitin [Cha66]. The basic idea is to measure the complexity of a string by the size in bits of the smallest program that can produce it. The idea comes from the observation that random strings ... |

422 |
A Formal Theory of Inductive Inference
- Solomonoff
- 1964
(Show Context)
Citation Context ...fined concepts using UTM have some sort of invariant property [LV93]. Kolmogorov Complexity Kolmogorov complexity, also called descriptive or algorithmic complexity [LV90], was developed by Solomonof =-=[Sol64]-=-, Kolmogorov [Kol65], and Chaitin [Cha66]. The basic idea is to measure the complexity of a string by the size in bits of the smallest program that can produce it. The idea comes from the observation ... |

393 | Fusion, propagation, and structuring in belief networks - Pearl - 1986 |

360 | Ant algorithms for discrete optimization
- Dorigo, Caro, et al.
- 1999
(Show Context)
Citation Context ... used to solve hard combinatorial optimization problems. The ACO meta-heuristic was first introduced by Dorigo in his Ph.D. thesis [Dor92], and was recently defined by Dorigo, Di Caro and Gambardella =-=[DCG99]. -=-Ant algorithm optimization was inspired by the observation of real ant colonies’ foraging behavior; in particular, how ants can find the shortest paths between food sources and their nest. Ants depo... |

341 | Estimating continuous distributions in Bayesian classifiers - John, Langley - 1995 |

334 | Algorithmic Information Theory - CHAITIN - 1977 |

324 |
A catalog of complexity classes, in
- Johnson
- 1990
(Show Context)
Citation Context ...es for all of its parameters. Formally, a problem is a total function on strings of an alphabet, such as {0, 1}. Let {0, 1} ∗ or � ∗ represent the set of all finite strings made up of 0s and 1s.=-= From [Joh90], -=-we have the following definition: Definition 1 (Problem and Instance) A problem is a set X of ordered pairs (I, A) of strings in {0, 1} ∗ , where I is called an instance. A is called an answer for t... |

272 |
The design of innovation - lessons from and for competent genetic algorithms
- Goldberg
- 2002
(Show Context)
Citation Context ...ut the usefulness of schema theory and BB hypothesis in explaining the effectiveness of GAs on structured problems [Gol89]. We just should not force BB hypothesis to apply also on random problems. In =-=[Gol02], Gold-=-berg points out a similar concept of a “design envelope” to emphasize the importance of bounding problem difficulty: “. . . no BB searcher should be expected to solve problems that are in some w... |

261 | Approximating probabilistic inference in Bayesian belief networks is NP-hard - Dagum, Luby - 1993 |

254 | No free lunch theorems for search - Wolpert, Macready - 1995 |

253 | Bayesian networks without tears
- Charniak
- 1991
(Show Context)
Citation Context ... my dissertation research that started 3 and a half years ago when I joined Dr. Hsu’s KDD group at K-State. I remember the first paper he handed to me was Charniak’s “Bayesian networks without t=-=ears” [Cha91]-=-. I found my interests in uncertain reasoning using Bayesian networks soon after I read the paper and implemented a Bayesian network learning algorithm, K2. When the time came for me to choose a disse... |

239 | On the Length of Programs for Computing Finite Binary Sequences
- Chaitin
- 1969
(Show Context)
Citation Context ...f invariant property [LV93]. Kolmogorov Complexity Kolmogorov complexity, also called descriptive or algorithmic complexity [LV90], was developed by Solomonof [Sol64], Kolmogorov [Kol65], and Chaitin =-=[Cha66]-=-. The basic idea is to measure the complexity of a string by the size in bits of the smallest program that can produce it. The idea comes from the observation that random strings are more difficult to... |

211 | Fitness distance correlation as a measure of problem difficulty for genetic algorithms
- Jones, Forrest
- 1995
(Show Context)
Citation Context ... predictive GA-hardness measure. These include deception [Gol87, Gol89, Whi90], isolation, noise [Gol93, Gol02], multimodality [GDH92, HG95, RW99a], landscape ruggedness, fitness distance correlation =-=[JF95], -=-epistasis variance [Dav91a, Nau98], epistasis correlation [Nau98, KNR01], site-wise optimization measure [Nau98], etc. 3.3.2 No Free Lunch Theorems Wolpert and Macready’s No Free Lunch (NFL) [WM95, ... |

204 |
Probabilistic reasoning in expert systems: Theory and algorithms
- Neapolitan
- 1990
(Show Context)
Citation Context ...ple networks: Asia and ALARM13. 73sFigure 4.6: JPDs at the First Three Steps of the Multiplicative Cascade of the Asia Network 4.4.1 The Self-similarity of the JPD: Asia We first use the Asia network =-=[Nea90] to -=-demonstrate the process of an agent’s incremental understanding of an uncertain domain as a binomial multiplicative cascade process, and to show the JPD’s self similarity property. Asia is a small... |

201 |
natural algorithms
- Dorigo, Optimization
- 1992
(Show Context)
Citation Context ... that take inspiration from the behavior of real ant colonies and are used to solve hard combinatorial optimization problems. The ACO meta-heuristic was first introduced by Dorigo in his Ph.D. thesis =-=[Dor92], -=-and was recently defined by Dorigo, Di Caro and Gambardella [DCG99]. Ant algorithm optimization was inspired by the observation of real ant colonies’ foraging behavior; in particular, how ants can f... |

196 | Modeling genetic algorithms with markov chains - Nix, Vose - 1992 |

195 |
Propagating Uncertainty Bayesian Networks by Probabilistic Logic Sampling
- Henrion
- 1988
(Show Context)
Citation Context ... the previous samples. The sampling distribution of a variable is computed from its previous sample given the states of its Markov blanket nodes. Importance sampling algorithms include Logic Sampling =-=[Hen88]-=-, Forward Sampling (Likelihood Weighting) [FC89, SP89], Backward Sampling [FF94], Self-Importance Sampling and Heuristic Importance Sampling [SP89], Adaptive Importance Sampling [CD00], etc. MCMC meth... |

187 | Ant Colonies for the Traveling Salesman Problem
- Dorigo
- 1997
(Show Context)
Citation Context ... samplesNeeded > 0, goto Step 2; else return best so far. Figure 7.5: Tabu Search for Finding the MPE 7.2.4 Hybrid Algorithm: Ant Colony Optimization for the MPE Problem Ant Colony Optimization (ACO) =-=[DG97]-=- studies artificial systems that take inspiration from the behavior of real ant colonies and are used to solve hard combinatorial optimization problems. The ACO meta-heuristic was first introduced by ... |

183 |
Algorithms for Random Generation and Counting: A Markov Chain Approach
- Sinclair
- 1993
(Show Context)
Citation Context ...he brute-force algorithm of exhaustively listing all possible elements in S and choosing one at random is often not viable because S can be extremely large. In this section we apply a Markov approach =-=[Sin93]-=- to generate permutations with given degree of presortedness uniformly at random. At the heart of this Markov approach is a simple algorithmic paradigm that simulates a Markov chain whose states are t... |

167 | The Simple Genetic Algorithm - Vose - 1999 |