Results 1  10
of
31
PrivacyPreserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Abstract

Cited by 615 (3 self)
 Add to MetaCart
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decisiontree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose anovel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Using NameBased Mappings to Increase Hit Rates
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 1997
"... Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the WWW and of Rendezvous Points in multicast routing are two such situations. However, this approach can be inefficient if identical requests are receive ..."
Abstract

Cited by 70 (6 self)
 Add to MetaCart
Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the WWW and of Rendezvous Points in multicast routing are two such situations. However, this approach can be inefficient if identical requests are received and processed by multiple servers. We present an analysis of this problem, and develop a method called the Highest Random Weight (HRW) Mapping that eliminates these difficulties. Given an object name and a set of servers, HRW maps a request to a server using the object name, rather than any a priori knowledge of server states. Since HRW always maps a given object name to the same server within a given cluster, it may be used locally at client sites to achieve consensus on objectserver mappings. We present an analysis of HRW and validate it with simulation results showing that it gives faster service times than traditional request allocation schemes such as roundrobin or leastloaded, and...
Candid covariancefree incremental principal component analysis
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—Appearancebased image analysis techniques require fast computation of principal components of highdimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariancefree IPCA (CCIPCA), used to compute the principal components ..."
Abstract

Cited by 55 (9 self)
 Add to MetaCart
Abstract—Appearancebased image analysis techniques require fast computation of principal components of highdimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariancefree IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariancefree). The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data). To do this, it keeps the scale of observations and computes the mean of observations incrementally, which is an efficient estimate for some wellknown distributions (e.g., Gaussian), although the highest possible efficiency is not guaranteed in our case because of unknown sample distribution. The method is for realtime applications and, thus, it does not allow iterations. It converges very fast for highdimensional image vectors. Some links between IPCA and the development of the cerebral cortex are also discussed. Index Terms—Principal component analysis, incremental principal component analysis, stochastic gradient ascent (SGA), generalized hebbian algorithm (GHA), orthogonal complement. æ 1
A Comparison of Dynamic and nonDynamic Rough Set Methods for Extracting Laws from Decision Tables
, 1998
"... We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on r ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on rough set (see [42]) and boolean reasoning (see [8]), with the method based on dynamic reducts and dynamic rules (see [3],[4],[5],[6]). We also compare the results of computer experiments on those data sets obtained by applying our system based on rough set methods with the results on the same data sets obtained with help of several data analysis systems known from literature.
Valueatrisk prediction using context modeling
 IN PSYCHOLOGY & MARKETING
, 2000
"... In financial market risk measurement, ValueatRisk (VaR) techniques have proven to be a very useful and popular tool. Unfortunately, most VaR estimation models suffer from major drawbacks: the lognormal (Gaussian) modeling of the returns does not take into account the observed fat tail distribution ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
In financial market risk measurement, ValueatRisk (VaR) techniques have proven to be a very useful and popular tool. Unfortunately, most VaR estimation models suffer from major drawbacks: the lognormal (Gaussian) modeling of the returns does not take into account the observed fat tail distribution and the nonstationarity of the financial instruments severely limits the efficiency of the VaR predictions. In this paper, we present a new approach to VaR estimation which is based on ideas from the field of information theory and lossless data compression. More specifically, the technique of context modeling is applied to estimate the VaR by conditioning the probability density function on the present context. Treestructured vector quantization is applied to partition the multidimensional state space of both macroeconomic and microeconomic priors into an increasing but limited number of context classes. Each class can be interpreted as a state of aggregation with its own statistical and dynamic behavior, or as a random walk with its own drift and step size. Results on the US S&P500 index, obtained using several evaluation methods, show the strong potential of this approach and prove that it can be applied successfully for, amongst other useful applications, VaR and volatility prediction. The October 1997 crash is indicated in time.
Vision and Motion Planning for a Mobile Robot under Uncertainty
 Int. J. of Robotics Research
, 1997
"... This paper describes a framework for vision and motion planning for a mobile robot. The task of the robot is to reach the destination in the minimum time while it detects possible routes by vision. Sincevisualrecognition is computationally expensive and the recognition result includes uncertainty, a ..."
Abstract

Cited by 11 (10 self)
 Add to MetaCart
This paper describes a framework for vision and motion planning for a mobile robot. The task of the robot is to reach the destination in the minimum time while it detects possible routes by vision. Sincevisualrecognition is computationally expensive and the recognition result includes uncertainty, a tradeoff must beconsideredbetween the cost of visual recognition and the effect of information to be obtainedbyrecognition. Using a probabilistic model of the uncertainty of the recognition result, visionmotion planning is formulatedasarecurrence formula. With this formulation, the optimal sequence of observation points is recursively determined. A generated plan is globally optimal because the planner minimizes the total cost. An efficient solution strategy is also described which employs a pruning methodbased on the lower bound of the total cost calculated by assuming perfect sensor information. Simulation results and experiments with an actual mobile robot demonstrate the feasibility of our approach.
Noncrossing trees revisited: cutting down and spanning subtrees
 Proceedings, Discrete Random Walks 2003, Cyril Banderier and
, 2003
"... Here we consider two parameters for random noncrossing trees: i ¡ the number of random cuts to destroy a sizen noncrossing tree and ii ¡ the spanning subtreesize of p randomly chosen nodes in a sizen noncrossing tree. For both quantities, we are able to characterise for n ¢ ∞ the limiting ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Here we consider two parameters for random noncrossing trees: i ¡ the number of random cuts to destroy a sizen noncrossing tree and ii ¡ the spanning subtreesize of p randomly chosen nodes in a sizen noncrossing tree. For both quantities, we are able to characterise for n ¢ ∞ the limiting distributions. Noncrossing trees are almost conditioned GaltonWatson trees, and it has been already shown, that the contour and other usually associated discrete excursions converge, suitable normalised, to the Brownian excursion. We can interpret parameter ii ¡ as a functional of a conditioned random walk, and although we do not have such an interpretation for parameter i ¡ , we obtain here limiting distributions, that are also arising as limits of some functionals of conditioned random walks. Keywords: Noncrossing trees, generating function, limiting distributions 1
Dynamic Reducts and Statistical Inference
 In: Proceedings of the Sixth International Conference, Information Procesing and Management of Uncertainty in Knowledge Based Systems (IPMU'96
, 1996
"... We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is often impossible to extract general laws from experimental data by computing first all reducts (Pawlak 1991) of a data table (decision table) and next decision rules from these reducts. We have devel ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is often impossible to extract general laws from experimental data by computing first all reducts (Pawlak 1991) of a data table (decision table) and next decision rules from these reducts. We have developed an idea of dynamic reducts as a tool allowing to find relevant reducts for the decision rule generation (Bazan 1994a), (Bazan 1994b), (Bazan 1994c), (Nguyen 1993). Tests on several data tables are showing that the application of dynamic reducts leads to the increasing of the classification quality and/or decreasing of the size of decision rule sets. In this paper we present some statistical arguments showing that the introduced stability coefficients of dynamic reducts are proper measures of their quality. Key words: knowledge discovery, rough sets, decision algorithms, machine learning. 1 INTRODUCTION The aim of the paper is to present a method for extracting laws...
Ascertaining the Underlying Distribution of a Data Set
 In
, 1994
"... We combine the concept of maximum correlation between two random variables with the Principal Coordinate Analysis technique, to propose a descriptive procedure to ascertain the underlying probability distribution of a univariate sample. Keywords and Phrases Fr'echet bounds; Maximum Correlation; Pri ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
We combine the concept of maximum correlation between two random variables with the Principal Coordinate Analysis technique, to propose a descriptive procedure to ascertain the underlying probability distribution of a univariate sample. Keywords and Phrases Fr'echet bounds; Maximum Correlation; Principal Coordinate Analysis; Goodnessof fit; Continuous Metric Scaling. AMS Subject classification: 62H20, 62H25. 1 Introduction The user of Data Analysis and Statistical Inference often faces the problem of identifying the underlying stochastic structure, given a sample drawn from a population. This is the specification problem. Wrong specification leads to an erroneous inference, which in statistical terminology is called the third kind of error. Fisher [4], was especially sensitive to this problem and interesting suggestions about specification were made by Rao [17]. Given a univariate sample x 1 ; x 2 ; : : : ; xN (1) of N independent observations of a random variable X, we propose...
Statistics of the Microwave Background Anisotropics Caused by the Squeezed Cosmological Perturbation”, Report WUGRAV956, grqc 9504045
"... The genuine quantum gravity effects can already be around us. It is likely that the observed largeangularscale anisotropies in the microwave background radiation are induced by cosmological perturbations of quantummechanical origin. Such perturbations are placed in squeezed vacuum quantum states ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The genuine quantum gravity effects can already be around us. It is likely that the observed largeangularscale anisotropies in the microwave background radiation are induced by cosmological perturbations of quantummechanical origin. Such perturbations are placed in squeezed vacuum quantum states and, hence, are characterized by large variances of their amplitude. The statistical properties of the anisotropies should reflect the underlying statistics of the squeezed vacuum quantum states. In this paper, the theoretical variances for the temperature angular correlation function are described in detail. It is shown that they are indeed large and must be present in the observational data, if the anisotropies are truly caused by the perturbations of quantummechanical origin. Unfortunately, these large theoretical statistical uncertainties will make the extraction of cosmological information from the measured anisotropies a much more difficult problem than we wanted it to be. This contribution to the Proceedings is largely based on references [42,8]. The Appendix contains an analysis of the “standard ” inflationary formula for density perturbations.