• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 118,706
Next 10 →

CURE: An Efficient Clustering Algorithm for Large Data sets

by Sudipto Guha, Rajeev Rastogi, Kyuseok Shim - Published in the Proceedings of the ACM SIGMOD Conference , 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract - Cited by 722 (5 self) - Add to MetaCart
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new

Implementing data cubes efficiently

by Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ulman - In SIGMOD , 1996
"... Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total ..."
Abstract - Cited by 548 (1 self) - Add to MetaCart
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like

Verbal reports as data

by K. Anders Ericsson, Herbert A. Simon - Psychological Review , 1980
"... The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc.). W ..."
Abstract - Cited by 513 (3 self) - Add to MetaCart
The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc

The Lorel Query Language for Semistructured Data

by Serge Abiteboul, Dallan Quass, Jason Mchugh, Jennifer Widom, Janet Wiener - International Journal on Digital Libraries , 1997
"... We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inapprop ..."
Abstract - Cited by 731 (29 self) - Add to MetaCart
applicability, the simple object model underlying Lorel can be viewed as an extension of ODMG and the language as an extension of OQL. The main novelties of the Lorel language are: (i) extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data

The use of the area under the ROC curve in the evaluation of machine learning algorithms

by Andrew P. Bradley - PATTERN RECOGNITION , 1997
"... In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Ne ..."
Abstract - Cited by 685 (3 self) - Add to MetaCart
In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k

Taming the Underlying Challenges of Reliable Multihop Routing in Sensor Networks

by Alec Woo, Terence Tong, David Culler - In SenSys , 2003
"... The dynamic and lossy nature of wireless communication poses major challenges to reliable, self-organizing multihop networks. These non-ideal characteristics are more problematic with the primitive, low-power radio transceivers found in sensor networks, and raise new issues that routing protocols mu ..."
Abstract - Cited by 781 (20 self) - Add to MetaCart
with constant space regardless of cell density. We study and evaluate link estimator, neighborhood table management, and reliable routing protocol techniques. We focus on a many-to-one, periodic data collection workload. We narrow the design space through evaluations on large-scale, high-level simulations to 50

Longitudinal data analysis using generalized linear models”.

by Kung-Yee Liang , Scott L Zeger - Biometrika, , 1986
"... SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating ..."
Abstract - Cited by 1526 (8 self) - Add to MetaCart
SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence

Calibrating noise to sensitivity in private data analysis

by Cynthia Dwork, Frank Mcsherry, Kobbi Nissim, Adam Smith - In Proceedings of the 3rd Theory of Cryptography Conference , 2006
"... Abstract. We continue a line of research initiated in [10, 11] on privacypreserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the datab ..."
Abstract - Cited by 649 (60 self) - Add to MetaCart
the ith row of the database and g maps data-base rows to [0, 1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single

BIRCH: an efficient data clustering method for very large databases

by Tian Zhang, Raghu Ramakrishnan, Miron Livny - In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD , 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract - Cited by 576 (2 self) - Add to MetaCart
is also the first clustering algorithm proposerl in the database area to handle “noise) ’ (data points that are not part of the underlying pattern) effectively. We evaluate BIRCH’S time/space efficiency, data input order sensitivity, and clustering quality through several experiments. We also present a

Space-time codes for high data rate wireless communication: Performance criterion and code construction

by Vahid Tarokh, Nambi Seshadri, A. R. Calderbank - IEEE TRANS. INFORM. THEORY , 1998
"... We consider the design of channel codes for improving the data rate and/or the reliability of communications over fading channels using multiple transmit antennas. Data is encoded by a channel code and the encoded data is split into n streams that are simultaneously transmitted using n transmit ant ..."
Abstract - Cited by 1782 (28 self) - Add to MetaCart
We consider the design of channel codes for improving the data rate and/or the reliability of communications over fading channels using multiple transmit antennas. Data is encoded by a channel code and the encoded data is split into n streams that are simultaneously transmitted using n transmit
Next 10 →
Results 1 - 10 of 118,706
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University