• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,755,313
Next 10 →

BIRCH: an efficient data clustering method for very large databases

by Tian Zhang, Raghu Ramakrishnan, Miron Livny - In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD , 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract - Cited by 557 (2 self) - Add to MetaCart
of large datasets and minimization of 1/0 costs. This paper presents a data clustering method named Bfll (;”H (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and clynamicall y clusters incoming

Community detection in graphs

by Santo Fortunato , 2009
"... The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of th ..."
Abstract - Cited by 801 (1 self) - Add to MetaCart
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices

CURE: An Efficient Clustering Algorithm for Large Data sets

by Sudipto Guha, Rajeev Rastogi, Kyuseok Shim - Published in the Proceedings of the ACM SIGMOD Conference , 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract - Cited by 713 (5 self) - Add to MetaCart
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new

GPFS: A Shared-Disk File System for Large Computing Clusters

by Frank Schmuck, Roger Haskin - In Proceedings of the 2002 Conference on File and Storage Technologies (FAST , 2002
"... GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community ove ..."
Abstract - Cited by 518 (3 self) - Add to MetaCart
existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.

Hierarchically Classifying Documents Using Very Few Words

by Daphne Koller, Mehran Sahami , 1997
"... The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which ignore the hierarchical structure and treat the topics as separate classes are often inadequate in text ..."
Abstract - Cited by 521 (8 self) - Add to MetaCart
tree. As we show, each of these smaller problems can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand. This set of relevant features varies widely throughout the hierarchy, so that, while the overall relevant feature set may be large, each

Implementing data cubes efficiently

by Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ulman - In SIGMOD , 1996
"... Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total ..."
Abstract - Cited by 545 (1 self) - Add to MetaCart
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like

The program dependence graph and its use in optimization

by Jeanne Ferrante, Karl J. Ottenstein, Joe D. Warren - ACM Transactions on Programming Languages and Systems , 1987
"... In this paper we present an intermediate program representation, called the program dependence graph (PDG), that makes explicit both the data and control dependence5 for each operation in a program. Data dependences have been used to represent only the relevant data flow relationships of a program. ..."
Abstract - Cited by 989 (3 self) - Add to MetaCart
. Control dependence5 are introduced to analogously represent only the essential control flow relationships of a program. Control dependences are derived from the usual control flow graph. Many traditional optimizations operate more efficiently on the PDG. Since dependences in the PDG connect

Good Error-Correcting Codes based on Very Sparse Matrices

by David J.C. MacKay , 1999
"... We study two families of error-correcting codes defined in terms of very sparse matrices. "MN" (MacKay--Neal) codes are recently invented, and "Gallager codes" were first investigated in 1962, but appear to have been largely forgotten, in spite of their excellent properties. The ..."
Abstract - Cited by 741 (23 self) - Add to MetaCart
We study two families of error-correcting codes defined in terms of very sparse matrices. "MN" (MacKay--Neal) codes are recently invented, and "Gallager codes" were first investigated in 1962, but appear to have been largely forgotten, in spite of their excellent properties

Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

by Mikhail Belkin, Partha Niyogi - Advances in Neural Information Processing Systems 14 , 2001
"... Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami operator on a manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in a higher ..."
Abstract - Cited by 664 (8 self) - Add to MetaCart
higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered.

Pregel: A system for large-scale graph processing

by Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski - IN SIGMOD , 2010
"... Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model ..."
Abstract - Cited by 472 (0 self) - Add to MetaCart
Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational
Next 10 →
Results 1 - 10 of 1,755,313
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University