Results 1  10
of
13
An Introduction to Symbolic Data Analysis and the Sodas Software
 Journal of Symbolic Data Analysis
, 2003
"... ..."
An extended version of the kmeans method for overlapping clustering
"... This paper deals with overlapping clustering, a trade off between crisp and fuzzy clustering. It has been motivated by recent applications in various domains such as information retrieval or biology. We show that the problem of finding a suitable coverage of data by overlapping clusters is not a tri ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper deals with overlapping clustering, a trade off between crisp and fuzzy clustering. It has been motivated by recent applications in various domains such as information retrieval or biology. We show that the problem of finding a suitable coverage of data by overlapping clusters is not a trivial task. We propose a new objective criterion and the associated algorithm OKM that generalizes the kmeans algorithm. Experiments show that overlapping clustering is a good alternative and indicate that OKM outperforms other existing methods. 1.
Knowledge Discovery From Symbolic Data And The Sodas Software
 Conf. on Principles and Practice of Knowledge Discovery in Databases, PPKDD2000
, 2000
"... The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their under ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means getting explanatory results, that why, "symbolic objects" are introduced and studied in this paper. They model concepts and constitute an explanatory output for data analysis. Moreover they can be used in order to define queries of a Relational Data Base and propagate concepts between Data Bases. We define "Symbolic Data Analysis" (SDA) as the extension of standard Data Analysis to symbolic data tables as input in order to find symbolic objects as output. In this paper we give an overview on recent development on SDA. We present some tools and methods of SDA and introduce the SODAS software prototype (issued from the work of 17 teams of nine countries involved in an European project of EUROSTAT). 1
ABSTRACT Clustering Pairwise Dissimilarity Data into Partially Ordered Sets
"... Ontologies represent data relationships as hierarchies of possibly overlapping classes. Ontologies are closely related to clustering hierarchies, and in this article we explore this relationship in depth. In particular, we examine the space of ontologies that can be generated by pairwise dissimilari ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Ontologies represent data relationships as hierarchies of possibly overlapping classes. Ontologies are closely related to clustering hierarchies, and in this article we explore this relationship in depth. In particular, we examine the space of ontologies that can be generated by pairwise dissimilarity matrices. We demonstrate that classical clustering algorithms, which take dissimilarity matrices as inputs, do not incorporate all available information. In fact, only special types of dissimilarity matrices can be exactly preserved by previous clustering methods. We model ontologies as a partially ordered set (poset) over the subset relation. In this paper, we propose a new clustering algorithm, that generates a partially ordered set of clusters from a dissimilarity matrix.
PoClustering: Lossless Clustering of Dissimilarity Data
"... Given a set of objects V with a dissimilarity measure between pairs of objects in V, a PoCluster is a collection of sets P ⊂ powerset(V) partially ordered by the ⊂ relation such that S ⊂ T iff the maximal dissimilarity among objects in S is less than the maximal dissimilarity among objects in T. PoC ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Given a set of objects V with a dissimilarity measure between pairs of objects in V, a PoCluster is a collection of sets P ⊂ powerset(V) partially ordered by the ⊂ relation such that S ⊂ T iff the maximal dissimilarity among objects in S is less than the maximal dissimilarity among objects in T. PoClusters capture categorizations of objects that are not strictly hierarchical, such as those found in ontologies. PoClusters can not, in general, be constructed using hierarchical clustering algorithms. In this paper, we examine the relationship between PoClusters and dissimilarity matrices and prove that PoClusters are in onetoone correspondence with the set of dissimilarity matrices. The PoClustering problem is NPComplete, and we present a heuristic algorithm for it in this paper. Experiments on both synthetic and real datasets demonstrate the quality and scalability of the algorithms. 1
Statistics in Archaeology: New directions
, 1998
"... March 25, 1998, in the opening session of the Computer Applications in Archaeology meeting (CAA'98). Special thanks are due to J. A. Barcelo, local organizer of the meeting. C. C. Beardah is thanked for allowing me to use data sets from Baxter and Beardah (1997). This work was partially supported by ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
March 25, 1998, in the opening session of the Computer Applications in Archaeology meeting (CAA'98). Special thanks are due to J. A. Barcelo, local organizer of the meeting. C. C. Beardah is thanked for allowing me to use data sets from Baxter and Beardah (1997). This work was partially supported by the Spanish DGES grant PB960300. Connections between Statistics and Archaeology have always appeared very fruitful. The objective of this paper is to o er an outlook of some statistical techniques that are being developed in the most recent years and that can be of interest for archaeologists in the short run.
Combinatoral Optimization in Clustering
"... Contents 1 Introduction 2 2 Types of Data 5 3 Cluster Structures 14 4 Clustering Criteria 15 5 Single Cluster Clustering 16 5.1 Clustering Approaches.......................... 16 5.1.1 De#nitionbased Clusters .................... 16 5.1.2 Direct Algorithms ........................ 18 5.1.3 Optimal ..."
Abstract
 Add to MetaCart
Contents 1 Introduction 2 2 Types of Data 5 3 Cluster Structures 14 4 Clustering Criteria 15 5 Single Cluster Clustering 16 5.1 Clustering Approaches.......................... 16 5.1.1 De#nitionbased Clusters .................... 16 5.1.2 Direct Algorithms ........................ 18 5.1.3 Optimal Clusters . ........................ 20 5.2 Single and Monotone Linkage Clusters ................. 21 5.2.1 MST and Single Linkage Clustering .............. 21 5.2.2 Monotone Linkage Clusters . . ................. 23 1 5.2.3 Modeling Skeletons in Digital Image Processing . . . . . . . . 25 5.2.4 Linkagebased Convex Criteria ................. 27 5.3 Moving Center and Approximation Clusters . . . . . ......... 29 5.3.1 Criteria for Moving Center Methods . . . . . ......... 29 5.3.2 Principal Cluster . . ....................... 29 5.3.3 Additive Cluster ......................... 32 5.3.4 Seriation with Returns . . . . . . ................ 34 6 Partitioning
Seriation in the Presence of Errors: A Factor 16 Approximation Algorithm for l∞Fitting Robinson Structures to Distances
 ALGORITHMICA
, 2007
"... The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as ..."
Abstract
 Add to MetaCart
The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as far from it as possible. This goal is best achieved by considering the Robinson property: a distance dR on X is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. If the distance d fails to satisfy the Robinson property, then we are lead to the problem of finding a reordering of d which is as close as possible to a Robinsonian distance. In this paper, we present a factor 16 approximation algorithm for the following NPhard fitting problem: given a finite set X and a dissimilarity d on X, wewish to find a Robinsonian dissimilarity dR on X minimizing the lâerror âd â dRâ â = maxx,yâX{d(x,y) â dR(x, y)} between d and dR.
www.stacsconf.org AN APPROXIMATION ALGORITHM FOR l∞FITTING ROBINSON STRUCTURES TO DISTANCES
"... Abstract. In this paper, we present a factor 16 approximation algorithm for the following NPhard distance fitting problem: given a finite set X and a distance d on X, find a Robinsonian distance dR on X minimizing the l∞error d −dR ∞ = maxx,y∈X{d(x, y) − dR(x,y)}. A distance dR on a finite ..."
Abstract
 Add to MetaCart
Abstract. In this paper, we present a factor 16 approximation algorithm for the following NPhard distance fitting problem: given a finite set X and a distance d on X, find a Robinsonian distance dR on X minimizing the l∞error d −dR ∞ = maxx,y∈X{d(x, y) − dR(x,y)}. A distance dR on a finite set X is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. Robinsonian distances generalize ultrametrics, line distances and occur in the seriation problems and in classification. 1.
Overlapping Patterns Recognition with Linear and NonLinear Separations using Positive Definite Kernels
, 2012
"... 41 rue de la liberte, ..."