Results 1  10
of
18,533
BitOptimal LempelZiv compression
, 2008
"... One of the most famous and investigated lossless datacompression scheme is the one introduced by Lempel and Ziv about 40 years ago [23]. This compression scheme is known as ”dictionarybased compression” and consists of squeezing an input string by replacing some of its substrings with (shorter) ..."
Abstract
 Add to MetaCart
on its generating source. In this paper we provide the first LZbased compressor which computes the bitoptimal parsing of any input string in efficient time and optimal space, for a general class of variablelength codeword encodings which encompasses most of the ones typically used in data compression
A Threshold of ln n for Approximating Set Cover
 JOURNAL OF THE ACM
, 1998
"... Given a collection F of subsets of S = f1; : : : ; ng, set cover is the problem of selecting as few as possible subsets from F such that their union covers S, and max kcover is the problem of selecting k subsets from F such that their union has maximum cardinality. Both these problems are NPhar ..."
Abstract

Cited by 778 (5 self)
 Add to MetaCart
hard. We prove that (1 \Gamma o(1)) ln n is a threshold below which set cover cannot be approximated efficiently, unless NP has slightly superpolynomial time algorithms. This closes the gap (up to low order terms) between the ratio of approximation achievable by the greedy algorithm (which is (1 \Gamma
Wrappers for Feature Subset Selection
 AIJ SPECIAL ISSUE ON RELEVANCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract

Cited by 1522 (3 self)
 Add to MetaCart
, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study
From data mining to knowledge discovery in databases
 AI Magazine
, 1996
"... ■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases ..."
Abstract

Cited by 510 (0 self)
 Add to MetaCart
in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular realworld applications, specific datamining techniques, challenges involved in realworld applications of knowledge discovery, and current and future
Planning Algorithms
, 2004
"... This book presents a unified treatment of many different kinds of planning algorithms. The subject lies at the crossroads between robotics, control theory, artificial intelligence, algorithms, and computer graphics. The particular subjects covered include motion planning, discrete planning, planning ..."
Abstract

Cited by 1108 (51 self)
 Add to MetaCart
This book presents a unified treatment of many different kinds of planning algorithms. The subject lies at the crossroads between robotics, control theory, artificial intelligence, algorithms, and computer graphics. The particular subjects covered include motion planning, discrete planning
Dryad: Distributed DataParallel Programs from Sequential Building Blocks
 In EuroSys
, 2007
"... Dryad is a generalpurpose distributed execution engine for coarsegrain dataparallel applications. A Dryad application combines computational “vertices ” with communication “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of availa ..."
Abstract

Cited by 730 (27 self)
 Add to MetaCart
failures, and transporting data between vertices.
Graphical models, exponential families, and variational inference
, 2008
"... The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building largescale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fiel ..."
Abstract

Cited by 800 (26 self)
 Add to MetaCart
fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes
Solving multiclass learning problems via errorcorrecting output codes
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass l ..."
Abstract

Cited by 730 (8 self)
 Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decisiontree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which errorcorrecting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decisiontree pruning. Finally,we show thatlike the other methodsthe errorcorrecting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that errorcorrecting output codes provide a generalpurpose method for improving the performance of inductive learning programs on multiclass problems.
Bayesian Network Classifiers
, 1997
"... Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restr ..."
Abstract

Cited by 788 (23 self)
 Add to MetaCart
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.
Fast Effective Rule Induction
, 1995
"... Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recentlyproposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error r ..."
Abstract

Cited by 1257 (21 self)
 Add to MetaCart
Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recentlyproposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error rates higher than those of C4.5 and C4.5rules. We then propose a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5rules with respect to error rates, but much more efficient on large samples. RIPPERk obtains error rates lower than or equivalent to C4.5rules on 22 of 37 benchmark problems, scales nearly linearly with the number of training examples, and can efficiently process noisy datasets containing hundreds of thousands of examples.
Results 1  10
of
18,533