Results 1 -
4 of
4
Scaling Clustering Algorithms to Large Databases”, Microsoft Research Report
, 1998
"... Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this wor ..."
Abstract
-
Cited by 197 (5 self)
- Add to MetaCart
Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this work, the framework is instantiated and numerically justified with the popular K-Means clustering algorithm. The method is based on identifying regions of the data that are compressible, regions that must be maintained in memory, and regions that are discardable. The algorithm operates within the confines of a limited memory buffer. Empirical results demonstrate that the scalable scheme outperforms a sampling-based approach. In our scheme, data resolution is preserved to the extent possible based upon the size of the allocated memory buffer and the fit of current clustering model to the data. The framework is naturally extended to update multiple clustering models simultaneously. We empirically evaluate on synthetic and publicly available data sets.
Scaling EM (Expectation-Maximization) Clustering to Large Databases
, 1999
"... Practical statistical clustering algorithms typically center upon an iterative refinement optimization procedure to compute a locally optimal clustering solution that maximizes the fit to data. These algorithms typically require many database scans to converge, and within each scan they require the ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Practical statistical clustering algorithms typically center upon an iterative refinement optimization procedure to compute a locally optimal clustering solution that maximizes the fit to data. These algorithms typically require many database scans to converge, and within each scan they require the access to every record in the data table. For large databases, the scans become prohibitively expensive. We present a scalable implementation of the Expectation-Maximization (EM) algorithm. The database community has focused on distance-based clustering schemes and methods have been developed to cluster either numerical or categorical data. Unlike distancebased algorithms (such as K-Means), EM constructs proper statistical models of the underlying data source and naturally generalizes to cluster databases containing both discrete-valued and continuous-valued data. The scalable method is based on a decomposition of the basic statistics the algorithm needs: identifying regions of the data that...
Global Optimization of RBF Networks
, 2000
"... Several modifications to parameter estimation in a Radial Basis Functions network are introduced. These include a better initializing clustering algorithm and a full gradient descent on centers and weights after weights were found via a matrix inversion. Performance comparison with other RBF algorit ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Several modifications to parameter estimation in a Radial Basis Functions network are introduced. These include a better initializing clustering algorithm and a full gradient descent on centers and weights after weights were found via a matrix inversion. Performance comparison with other RBF algorithms is given on several data-sets. It is found that The proposed method was found superior to Bishop's EM training algorithm, to Orr's method [1] for as well as a conventional implementation. I. Introduction Radial basis functions have been extensively used for interpolation [2], [3], [4], [5], [6], [7] regression and classification due to their universal approximation properties and simple parameter estimation. The parameter estimation requires a (pseudo) inversion of a (possibly sparse) matrix. The possible numerical instability of the inversion (which is aggravated when the number of training patterns is small compared to the dimensionality) may be partially alleviated by further parame...
Automatic Refinement of User Requirements: A Case Study in Software Tool Evaluation
, 2002
"... This paper presented a sy stematic approach to evaluating outcomes of the SDRM research project (Software Development Research Method - Nunamaker, Chen, et al. 1991). The empirical method adopted in this evaluation is grounded in the belief that simple sy stem testing is insufficient in a develo ..."
Abstract
- Add to MetaCart
This paper presented a sy stematic approach to evaluating outcomes of the SDRM research project (Software Development Research Method - Nunamaker, Chen, et al. 1991). The empirical method adopted in this evaluation is grounded in the belief that simple sy stem testing is insufficient in a development-based research. Ty pically testing focuses on showing the method or a tool to be correct or efficient, however, the investigation of the proposed method or a tool impact on its target audience is frequently either missing or is conducted in an indirect way . As an alternative to the commonly used approaches, this paper explored and fused a number of different empirical methods to define and develop a testing environment for the proposed method/tool, to calibrate test data so that the test results could be assessed and compared, and finally to test and evaluate the method/tool in the well-designed and calibrated environment. As the RARE IDIOM project focuses on using information retrieval techniques to support requirements refinement into reusable designs, the evaluation process involved definition of suitable problem (for requirements) and solution (for designs) domains, populating this domain with fully described and classified concepts and artefacts, calibrating the quality measures of requirements refinement with respect to the expert and novice design decisions, and finally evaluation of the method/tool by showing its performance to be approaching that of experts and significantly exceeding that of novices. The paper thus demonstrated the effectiveness of the RARE IDIOM method/tool. Even more importantly , it also illustrated that development-based research, as exemplified by the SDRM method, can be grounded in the sound empirical proc...

