Results 1  10
of
17
Proximity Graphs for Nearest Neighbor Decision Rules: Recent Progress
 Progress”, Proceedings of the 34 th Symposium on the INTERFACE
, 2002
"... In the typical nonparametric approach to pattern classification, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the most well known such rules is the knearestneighbor decision rule (also known as instancebased learning, and lazy le ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
In the typical nonparametric approach to pattern classification, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the most well known such rules is the knearestneighbor decision rule (also known as instancebased learning, and lazy learning) in which an unknown pattern is classified into the majority class among its k nearest neighbors in the training set. Several questions related to this rule have received considerable attention over the years. Such questions include the following. How can the storage of the training set be reduced without degrading the performance of the decision rule? How should the reduced training set be selected to represent the different classes? How large should k be? How should the value of k be chosen? Should all k neighbors be equally weighted when used to decide the class of an unknown pattern? If not, how should the weights be chosen? Should all the features (attributes) we weighted equally and if not how should the feature weights be chosen? What distance metric should be used? How can the rule be made robust to overlapping classes or noise present in the training data? How can the rule be made invariant to scaling of the measurements? Geometric proximity graphs such as Voronoi diagrams and their many relatives provide elegant solutions to most of these problems. After a brief and nonexhaustive review of some of the classical canonical approaches to solving these problems, the methods that use proximity graphs are discussed, some new observations are made, and avenues for further research are proposed.
Estimating And Depicting The Structure Of A Distribution Of Random Functions
, 2000
"... . We suggest a nonparametric approach to making inference about the structure of distributions in a potentially infinitedimensional space, for example a function space, and displaying information about that structure. Our methodology is based on nonparametric density estimation, and draws inference ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. We suggest a nonparametric approach to making inference about the structure of distributions in a potentially infinitedimensional space, for example a function space, and displaying information about that structure. Our methodology is based on nonparametric density estimation, and draws inference about the slope of the density. The latter step is implemented in a purely iterative way, using only elementary operations of addition and multiplication, and does not require any differentiation or dimensionreduction. Nevertheless it leads in a very simple and reliable manner to "curves" of steepest ascent up the "surface" defined by an estimate of the density of a potentially infinitedimensional distribution. The projections of these curves into the sample space are always onedimensional, or more properly oneparameter, structures, and so can be displayed visually even when the sample space is a class of functions. Also, the modes to which the sample space projections lead are themselv...
On the History of Combinatorial Optimization (till 1960)
"... Introduction As a coherent mathematical discipline, combinatorial optimization is relatively young. When studying the history of the field, one observes a number of independent lines of research, separately considering problems like optimum assignment, shortest spanning tree, transportation, and the ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Introduction As a coherent mathematical discipline, combinatorial optimization is relatively young. When studying the history of the field, one observes a number of independent lines of research, separately considering problems like optimum assignment, shortest spanning tree, transportation, and the traveling salesman problem. Only in the 1950's, when the unifying tool of linear and integer programming became available and the area of operations research got intensive attention, these problems were put into one framework, and relations between them were laid. Indeed, linear programming forms the hinge in the history of combinatorial optimization. Its initial conception by Kantorovich and Koopmans was motivated by combinatorial applications, in particular in transportation and transshipment. After the formulation of linear programming as generic problem, and the development in 1947 by Dantzig of the simplex method as a tool, one has tried to attack about all combinatorial opti
Statistical Clustering
, 2000
"... Introduction Every formal inquiry begins with a denition of the objects of the inquiry and of the terms to be used in the inquiry, a classi cation of objects and operations. Since about 1950, a variety of simple explicit classication methods, called clustering methods, for dierent standard types o ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Introduction Every formal inquiry begins with a denition of the objects of the inquiry and of the terms to be used in the inquiry, a classi cation of objects and operations. Since about 1950, a variety of simple explicit classication methods, called clustering methods, for dierent standard types of data have been proposed. Most of these methods have no formal probabilistic underpinnings. And indeed, classication is not a subset of statistics; data collection, probability models, and data analyis all require informal subjective prior classications. For example, in data analysis, some kind Int. Encyc. Social and Behavioral Sciences 7 August 2000 2 of binning or selection operation is performed prior to a formal statistical analysis. However, statistical methods can assist in classication in four ways: a) In devising probability models for data and classes so that probable classications for a given set of data can be identi ed; b) in d
Clustering the Hypercube
 SFB Report Series 93, TUGraz
, 1996
"... In this paper we consider various clustering methods for objects represented as binary strings of fixed length d. The dissimilarity of two given objects is the number of disagreeing bits, that is, their Hamming distance. Clustering these objects can be seen as clustering a subset of the vertices of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
In this paper we consider various clustering methods for objects represented as binary strings of fixed length d. The dissimilarity of two given objects is the number of disagreeing bits, that is, their Hamming distance. Clustering these objects can be seen as clustering a subset of the vertices of a ddimensional hypercube, and thus is a geometric problem in d dimensions. We give algorithms for various agglomerative hierarchical methods (including single linkage and complete linkage) as well as for twoclusterings and divisive methods. We only present linear space algorithms since for most practical applications the number of objects to be clustered is usually to large for nonlinear space solutions to be practicable. All algorithms are easy to implement and the constants in their asymptotic runtime are small. We give experimental results for all cluster methods considered, and for uniformly distributed hypercube vertices as well as for specially chosen sets. These experiments indicat...
Evidence for a Relationship Between Algorithmic Scheme And Shape Of Inferred Trees
"... Agglomeration and addition are the two main algorithmic schemes for constructing a tree distance from a dissimilarity matrix. The former scheme iteratively agglomerates pairs of leaves to form larger and larger clusters, while the latter proceeds by stepwise addition of objects to a growing tree. A ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Agglomeration and addition are the two main algorithmic schemes for constructing a tree distance from a dissimilarity matrix. The former scheme iteratively agglomerates pairs of leaves to form larger and larger clusters, while the latter proceeds by stepwise addition of objects to a growing tree. A third approach involves improving the global fitness of an initial tree by exchanging subtrees. This article suggests that the shape of inferred trees partly depends on the chosen algorithmic scheme: agglomeration tends to produce compact and bushy tree shapes, while addition and exchange have a preference for sparse and chainlike trees. This phenomenon is explained by the difference between the a priori probability distributions induced by each scheme. An illustration is provided with the Mitochondrial Eve data set (Vigilant et al. 1991), and the practical impacts are discussed.
On the Complexity of Minimum SumofSquares Clustering
, 2007
"... Les textes publiés dans la série des rapports de recherche HEC n’engagent que la responsabilité de leurs auteurs. La publication de ces rapports de recherche bénéficie d’une subvention du Fonds québécois de la recherche sur la nature et les technologies. ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Les textes publiés dans la série des rapports de recherche HEC n’engagent que la responsabilité de leurs auteurs. La publication de ces rapports de recherche bénéficie d’une subvention du Fonds québécois de la recherche sur la nature et les technologies.
Efficient {0,1}String Searching Based on Preclustering
, 1996
"... In this paper we consider the f0,1gstring searching problem. For a given set S of binary strings of fixed length d and a query string q one asks for the most similar string in S. Thereby the dissimilarity of two given strings is the number of disagreeing bits, that is, their Hamming distance. We p ..."
Abstract
 Add to MetaCart
In this paper we consider the f0,1gstring searching problem. For a given set S of binary strings of fixed length d and a query string q one asks for the most similar string in S. Thereby the dissimilarity of two given strings is the number of disagreeing bits, that is, their Hamming distance. We present an efficient f0,1gstring searching algorithm based on hierarchical preclustering. To this end we give several useful observations on the interand intracluster distances. The presented algorithms are easy to implement and we give exhaustive experimental results for uniformly distributed sets as well as for specially chosen strings. These experiments indicate that our algorithms work well in practice. 1 Introduction 1.1 Notation and Problem Definition In this paper we consider the f0,1gstring search problem. For a given set S of binary strings of fixed length d and a query string q we ask for the most similar string in S. Thereby the dissimilarity of two given strings is the numbe...
SAS/STAT ® 9.2 User’s Guide The CLUSTER Procedure (Book Excerpt)
, 1247
"... For a Web download or ebook: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is ..."
Abstract
 Add to MetaCart
For a Web download or ebook: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.22719, Commercial Computer SoftwareRestricted Rights (June 1987).
Clustering huge data sets for parametric PET imaging
, 2002
"... A new preprocessing clustering technique for quantification of kinetic PET data is presented. A twostage clustering process, which combines a precluster and a classic hierarchical cluster analysis, provides data which are clustered according to a distance measure between time activity curves (TACs) ..."
Abstract
 Add to MetaCart
A new preprocessing clustering technique for quantification of kinetic PET data is presented. A twostage clustering process, which combines a precluster and a classic hierarchical cluster analysis, provides data which are clustered according to a distance measure between time activity curves (TACs). The resulting clustered mean TACs can be used directly for estimation of kinetic parameters at the cluster level, or to span a vector space that is used for subsequent estimation of voxel level kinetics. The introduction of preclustering significantly reduces the overall time for clustering of multiframe kinetic data. The efficiency and superiority of the preclustering scheme combined with thresholding is validated by comparison of the results for clustering both with and without preclustering for FDGPET brain data of 13 healthy subjects.