• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 9,894
Next 10 →

Business Process Understanding: Mining Many Datasets

by Jan M. Zytkow, Arun P. Sanjeev
"... Institutional databases can be instrumental in understanding a business process, but additional data may broaden the empirical perspective on the investigated process. We present a few data mining principles by which a business process can be analyzed and the results represented. Sequential and para ..."
Abstract - Add to MetaCart
. As an example we use mining for knowledge about student enrollment, which is an essential part of the university educational process. The target of discovery has been the understanding of the university enrollment. Many discoveries have been made. The particularly surprising ndings have been presented

Very simple classification rules perform well on most commonly used datasets

by Robert C. Holte - Machine Learning , 1993
"... The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest to the machin ..."
Abstract - Cited by 547 (5 self) - Add to MetaCart
to the machine learning community. There are in the literature some indications that very simple rules may achieve surprisingly high accuracy on many datasets. For example, Rendell occasionally remarks that many real world datasets have "few peaks (often just one) " and so are &

LabelMe: A Database and Web-Based Tool for Image Annotation

by B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman , 2008
"... We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sha ..."
Abstract - Cited by 679 (46 self) - Add to MetaCart
sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

by Timothy L Bailey , Charles Elkan - Proc Int Conf Intell Syst Mol Biol , 1994
"... Abstract The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expect~tiou ma.,dmization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to th ..."
Abstract - Cited by 947 (5 self) - Add to MetaCart
together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering

Fast Effective Rule Induction

by William W. Cohen , 1995
"... Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error r ..."
Abstract - Cited by 1274 (21 self) - Add to MetaCart
Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error

LOF: Identifying density-based local outliers

by Markus M Breunig , Hans-Peter Kriegel , Raymond T Ng , Jörg Sander - MOD , 2000
"... For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for ..."
Abstract - Cited by 516 (13 self) - Add to MetaCart
analysis showing that LOF enjoys many desirable properties. Using realworld datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show

Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data

by Terrence S. Furey, Nello Cristianini, Nigel Duffy, David W. Bednarski, Michèl Schummer, David Haussler , 2000
"... Motivation: DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data ..."
Abstract - Cited by 569 (1 self) - Add to MetaCart
are analysed. The results are comparable to those previously obtained. We show that other machine learning methods also perform comparably to the SVM on many of those datasets. Availability: The SVM software is available at http:// www. cs.columbia.edu/#bgrundy/svm. Contact: booch@cse.ucsc.edu

Supervised and unsupervised discretization of continuous features

by James Dougherty, Ron Kohavi, Mehran Sahami - in A. Prieditis & S. Russell, eds, Machine Learning: Proceedings of the Twelfth International Conference , 1995
"... Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de n-ing characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised dis ..."
Abstract - Cited by 540 (11 self) - Add to MetaCart
Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de n-ing characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised

Activity recognition from user-annotated acceleration data

by Ling Bao, Stephen S. Intille , 2004
"... In this work, algorithms are developed and evaluated to detect physical activities from data acquired using five small biaxial accelerometers worn simultaneously on different parts of the body. Acceleration data was collected from 20 subjects without researcher supervision or observation. Subjects ..."
Abstract - Cited by 515 (7 self) - Add to MetaCart
in recognition because conjunctions in acceleration feature values can effectively discriminate many activities. With just two biaxial accelerometers – thigh and wrist – the recognition performance dropped only slightly. This is the first work to investigate performance of recognition algorithms with multiple

Fast approximate nearest neighbors with automatic algorithm configuration

by Marius Muja, David G. Lowe - In VISAPP International Conference on Computer Vision Theory and Applications , 2009
"... nearest-neighbors search, randomized kd-trees, hierarchical k-means tree, clustering. For many computer vision problems, the most time consuming component consists of nearest neighbor matching in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems ..."
Abstract - Cited by 455 (2 self) - Add to MetaCart
that applies priority search on hierarchical k-means trees, which we have found to provide the best known performance on many datasets. After testing a range of alternatives, we have found that multiple randomized k-d trees provide the best performance for other datasets. We are releasing public domain code
Next 10 →
Results 1 - 10 of 9,894
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University