Results 1 - 10
of
118,706
CURE: An Efficient Clustering Algorithm for Large Data sets
- Published in the Proceedings of the ACM SIGMOD Conference
, 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract
-
Cited by 722 (5 self)
- Add to MetaCart
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new
Implementing data cubes efficiently
- In SIGMOD
, 1996
"... Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total ..."
Abstract
-
Cited by 548 (1 self)
- Add to MetaCart
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like
Verbal reports as data
- Psychological Review
, 1980
"... The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc.). W ..."
Abstract
-
Cited by 513 (3 self)
- Add to MetaCart
The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc
The Lorel Query Language for Semistructured Data
- International Journal on Digital Libraries
, 1997
"... We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inapprop ..."
Abstract
-
Cited by 731 (29 self)
- Add to MetaCart
applicability, the simple object model underlying Lorel can be viewed as an extension of ODMG and the language as an extension of OQL. The main novelties of the Lorel language are: (i) extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data
The use of the area under the ROC curve in the evaluation of machine learning algorithms
- PATTERN RECOGNITION
, 1997
"... In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Ne ..."
Abstract
-
Cited by 685 (3 self)
- Add to MetaCart
In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k
Taming the Underlying Challenges of Reliable Multihop Routing in Sensor Networks
- In SenSys
, 2003
"... The dynamic and lossy nature of wireless communication poses major challenges to reliable, self-organizing multihop networks. These non-ideal characteristics are more problematic with the primitive, low-power radio transceivers found in sensor networks, and raise new issues that routing protocols mu ..."
Abstract
-
Cited by 781 (20 self)
- Add to MetaCart
with constant space regardless of cell density. We study and evaluate link estimator, neighborhood table management, and reliable routing protocol techniques. We focus on a many-to-one, periodic data collection workload. We narrow the design space through evaluations on large-scale, high-level simulations to 50
Longitudinal data analysis using generalized linear models”.
- Biometrika,
, 1986
"... SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating ..."
Abstract
-
Cited by 1526 (8 self)
- Add to MetaCart
SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence
Calibrating noise to sensitivity in private data analysis
- In Proceedings of the 3rd Theory of Cryptography Conference
, 2006
"... Abstract. We continue a line of research initiated in [10, 11] on privacypreserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the datab ..."
Abstract
-
Cited by 649 (60 self)
- Add to MetaCart
the ith row of the database and g maps data-base rows to [0, 1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single
BIRCH: an efficient data clustering method for very large databases
- In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD
, 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract
-
Cited by 576 (2 self)
- Add to MetaCart
is also the first clustering algorithm proposerl in the database area to handle “noise) ’ (data points that are not part of the underlying pattern) effectively. We evaluate BIRCH’S time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a
Space-time codes for high data rate wireless communication: Performance criterion and code construction
- IEEE TRANS. INFORM. THEORY
, 1998
"... We consider the design of channel codes for improving the data rate and/or the reliability of communications over fading channels using multiple transmit antennas. Data is encoded by a channel code and the encoded data is split into n streams that are simultaneously transmitted using n transmit ant ..."
Abstract
-
Cited by 1782 (28 self)
- Add to MetaCart
We consider the design of channel codes for improving the data rate and/or the reliability of communications over fading channels using multiple transmit antennas. Data is encoded by a channel code and the encoded data is split into n streams that are simultaneously transmitted using n transmit
Results 1 - 10
of
118,706