Results 1 - 10
of
25
Privacy-Preserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Abstract
-
Cited by 483 (3 self)
- Add to MetaCart
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a-novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Using Name-Based Mappings to Increase Hit Rates
- IEEE/ACM TRANSACTIONS ON NETWORKING
, 1997
"... Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the WWW and of Rendezvous Points in multicast routing are two such situations. However, this approach can be inefficient if identical requests are receive ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the WWW and of Rendezvous Points in multicast routing are two such situations. However, this approach can be inefficient if identical requests are received and processed by multiple servers. We present an analysis of this problem, and develop a method called the Highest Random Weight (HRW) Mapping that eliminates these difficulties. Given an object name and a set of servers, HRW maps a request to a server using the object name, rather than any a priori knowledge of server states. Since HRW always maps a given object name to the same server within a given cluster, it may be used locally at client sites to achieve consensus on objectserver mappings. We present an analysis of HRW and validate it with simulation results showing that it gives faster service times than traditional request allocation schemes such as round-robin or least-loaded, and...
A Comparison of Dynamic and non--Dynamic Rough Set Methods for Extracting Laws from Decision Tables
, 1998
"... We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor - see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on r ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor - see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on rough set (see [42]) and boolean reasoning (see [8]), with the method based on dynamic reducts and dynamic rules (see [3],[4],[5],[6]). We also compare the results of computer experiments on those data sets obtained by applying our system based on rough set methods with the results on the same data sets obtained with help of several data analysis systems known from literature.
Candid covariance-free incremental principal component analysis
- IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—Appearance-based image analysis techniques require fast computation of principal components of high-dimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
Abstract—Appearance-based image analysis techniques require fast computation of principal components of high-dimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariance-free IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariance-free). The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data). To do this, it keeps the scale of observations and computes the mean of observations incrementally, which is an efficient estimate for some wellknown distributions (e.g., Gaussian), although the highest possible efficiency is not guaranteed in our case because of unknown sample distribution. The method is for real-time applications and, thus, it does not allow iterations. It converges very fast for high-dimensional image vectors. Some links between IPCA and the development of the cerebral cortex are also discussed. Index Terms—Principal component analysis, incremental principal component analysis, stochastic gradient ascent (SGA), generalized hebbian algorithm (GHA), orthogonal complement. æ 1
Value-at-risk prediction using context modeling
- IN PSYCHOLOGY & MARKETING
, 2000
"... In financial market risk measurement, Value-at-Risk (VaR) techniques have proven to be a very useful and popular tool. Unfortunately, most VaR estimation models suffer from major drawbacks: the lognormal (Gaussian) modeling of the returns does not take into account the observed fat tail distribution ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
In financial market risk measurement, Value-at-Risk (VaR) techniques have proven to be a very useful and popular tool. Unfortunately, most VaR estimation models suffer from major drawbacks: the lognormal (Gaussian) modeling of the returns does not take into account the observed fat tail distribution and the non-stationarity of the financial instruments severely limits the efficiency of the VaR predictions. In this paper, we present a new approach to VaR estimation which is based on ideas from the field of information theory and lossless data compression. More specifically, the technique of context modeling is applied to estimate the VaR by conditioning the probability density function on the present context. Tree-structured vector quantization is applied to partition the multi-dimensional state space of both macroeconomic and microeconomic priors into an increasing but limited number of context classes. Each class can be interpreted as a state of aggregation with its own statistical and dynamic behavior, or as a random walk with its own drift and step size. Results on the US S&P500 index, obtained using several evaluation methods, show the strong potential of this approach and prove that it can be applied successfully for, amongst other useful applications, VaR and volatility prediction. The October 1997 crash is indicated in time.
Vision and Motion Planning for a Mobile Robot under Uncertainty
- Int. J. of Robotics Research
, 1997
"... This paper describes a framework for vision and motion planning for a mobile robot. The task of the robot is to reach the destination in the minimum time while it detects possible routes by vision. Sincevisualrecognition is computationally expensive and the recognition result includes uncertainty, a ..."
Abstract
-
Cited by 11 (10 self)
- Add to MetaCart
This paper describes a framework for vision and motion planning for a mobile robot. The task of the robot is to reach the destination in the minimum time while it detects possible routes by vision. Sincevisualrecognition is computationally expensive and the recognition result includes uncertainty, a trade-off must beconsideredbetween the cost of visual recognition and the effect of information to be obtainedbyrecognition. Using a probabilistic model of the uncertainty of the recognition result, vision-motion planning is formulatedasarecurrence formula. With this formulation, the optimal sequence of observation points is recursively determined. A generated plan is globally optimal because the planner minimizes the total cost. An efficient solution strategy is also described which employs a pruning methodbased on the lower bound of the total cost calculated by assuming perfect sensor information. Simulation results and experiments with an actual mobile robot demonstrate the feasibility of our approach.
Dynamic Reducts and Statistical Inference
- In: Proceedings of the Sixth International Conference, Information Procesing and Management of Uncertainty in Knowledge- Based Systems (IPMU'96
, 1996
"... We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is often impossible to extract general laws from experimental data by computing first all reducts (Pawlak 1991) of a data table (decision table) and next decision rules from these reducts. We have devel ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is often impossible to extract general laws from experimental data by computing first all reducts (Pawlak 1991) of a data table (decision table) and next decision rules from these reducts. We have developed an idea of dynamic reducts as a tool allowing to find relevant reducts for the decision rule generation (Bazan 1994a), (Bazan 1994b), (Bazan 1994c), (Nguyen 1993). Tests on several data tables are showing that the application of dynamic reducts leads to the increasing of the classification quality and/or decreasing of the size of decision rule sets. In this paper we present some statistical arguments showing that the introduced stability coefficients of dynamic reducts are proper measures of their quality. Key words: knowledge discovery, rough sets, decision algorithms, machine learning. 1 INTRODUCTION The aim of the paper is to present a method for extracting laws...
Non-crossing trees revisited: cutting down and spanning subtrees
- Proceedings, Discrete Random Walks 2003, Cyril Banderier and
, 2003
"... Here we consider two parameters for random non-crossing trees: i ¡ the number of random cuts to destroy a sizen non-crossing tree and ii ¡ the spanning subtree-size of p randomly chosen nodes in a size-n non-crossing tree. For both quantities, we are able to characterise for n ¢ ∞ the limiting ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Here we consider two parameters for random non-crossing trees: i ¡ the number of random cuts to destroy a sizen non-crossing tree and ii ¡ the spanning subtree-size of p randomly chosen nodes in a size-n non-crossing tree. For both quantities, we are able to characterise for n ¢ ∞ the limiting distributions. Non-crossing trees are almost conditioned Galton-Watson trees, and it has been already shown, that the contour and other usually associated discrete excursions converge, suitable normalised, to the Brownian excursion. We can interpret parameter ii ¡ as a functional of a conditioned random walk, and although we do not have such an interpretation for parameter i ¡ , we obtain here limiting distributions, that are also arising as limits of some functionals of conditioned random walks. Keywords: Non-crossing trees, generating function, limiting distributions 1
Ascertaining the Underlying Distribution of a Data Set
- In
, 1994
"... We combine the concept of maximum correlation between two random variables with the Principal Coordinate Analysis technique, to propose a descriptive procedure to ascertain the underlying probability distribution of a univariate sample. Keywords and Phrases Fr'echet bounds; Maximum Correlation; Pri ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
We combine the concept of maximum correlation between two random variables with the Principal Coordinate Analysis technique, to propose a descriptive procedure to ascertain the underlying probability distribution of a univariate sample. Keywords and Phrases Fr'echet bounds; Maximum Correlation; Principal Coordinate Analysis; Goodness--of-- fit; Continuous Metric Scaling. AMS Subject classification: 62H20, 62H25. 1 Introduction The user of Data Analysis and Statistical Inference often faces the problem of identifying the underlying stochastic structure, given a sample drawn from a population. This is the specification problem. Wrong specification leads to an erroneous inference, which in statistical terminology is called the third kind of error. Fisher [4], was especially sensitive to this problem and interesting suggestions about specification were made by Rao [17]. Given a univariate sample x 1 ; x 2 ; : : : ; xN (1) of N independent observations of a random variable X, we propose...
Commitment, Trembling Hand Imperfection and Observability in Games
, 1996
"... In an important contribution Bagwell [1995] showed that the value of commitment tends to vanish if the observability of commitments is subject to an arbitrarily small distortion, due to the possibility of misunderstanding or communication error. Bagwell's observation calls into question the many sta ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In an important contribution Bagwell [1995] showed that the value of commitment tends to vanish if the observability of commitments is subject to an arbitrarily small distortion, due to the possibility of misunderstanding or communication error. Bagwell's observation calls into question the many stage games that have been exceedingly popular in economics, especially in theoretical industrial organization. The present paper contributes to assess the robustness of Bagwell's result. We add other distortions to Bagwell model, such as "trembles" in players' execution of actions. We show that the unique pure strategy equilibrium of the game converges to the unique equilibrium outcome of the simultaneous move game with perfect observability if the noise associated with the observation of the leader's choice is small relative to the probability of trembles or if there are many other such imperfections. These results suggest that Bagwell 's result is driven by his exclusive consideration of a p...

