Results 1  10
of
49
PrivacyPreserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Abstract

Cited by 774 (3 self)
 Add to MetaCart
(Show Context)
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decisiontree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose anovel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Using NameBased Mappings to Increase Hit Rates
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 1997
"... Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the WWW and of Rendezvous Points in multicast routing are two such situations. However, this approach can be inefficient if identical requests are receive ..."
Abstract

Cited by 82 (6 self)
 Add to MetaCart
Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the WWW and of Rendezvous Points in multicast routing are two such situations. However, this approach can be inefficient if identical requests are received and processed by multiple servers. We present an analysis of this problem, and develop a method called the Highest Random Weight (HRW) Mapping that eliminates these difficulties. Given an object name and a set of servers, HRW maps a request to a server using the object name, rather than any a priori knowledge of server states. Since HRW always maps a given object name to the same server within a given cluster, it may be used locally at client sites to achieve consensus on objectserver mappings. We present an analysis of HRW and validate it with simulation results showing that it gives faster service times than traditional request allocation schemes such as roundrobin or leastloaded, and...
Candid covariancefree incremental principal component analysis
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—Appearancebased image analysis techniques require fast computation of principal components of highdimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariancefree IPCA (CCIPCA), used to compute the principal components ..."
Abstract

Cited by 76 (9 self)
 Add to MetaCart
(Show Context)
Abstract—Appearancebased image analysis techniques require fast computation of principal components of highdimensional image vectors. We introduce a fast incremental principal component analysis (IPCA) algorithm, called candid covariancefree IPCA (CCIPCA), used to compute the principal components of a sequence of samples incrementally without estimating the covariance matrix (so covariancefree). The new method is motivated by the concept of statistical efficiency (the estimate has the smallest variance given the observed data). To do this, it keeps the scale of observations and computes the mean of observations incrementally, which is an efficient estimate for some wellknown distributions (e.g., Gaussian), although the highest possible efficiency is not guaranteed in our case because of unknown sample distribution. The method is for realtime applications and, thus, it does not allow iterations. It converges very fast for highdimensional image vectors. Some links between IPCA and the development of the cerebral cortex are also discussed. Index Terms—Principal component analysis, incremental principal component analysis, stochastic gradient ascent (SGA), generalized hebbian algorithm (GHA), orthogonal complement. æ 1
A Comparison of Dynamic and nonDynamic Rough Set Methods for Extracting Laws from Decision Tables
, 1998
"... We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), ..."
Abstract

Cited by 63 (6 self)
 Add to MetaCart
We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on rough set (see [42]) and boolean reasoning (see [8]), with the method based on dynamic reducts and dynamic rules (see [3],[4],[5],[6]). We also compare the results of computer experiments on those data sets obtained by applying our system based on rough set methods with the results on the same data sets obtained with help of several data analysis systems known from literature.
Valueatrisk prediction using context modeling
 IN PSYCHOLOGY & MARKETING
, 2000
"... In financial market risk measurement, ValueatRisk (VaR) techniques have proven to be a very useful and popular tool. Unfortunately, most VaR estimation models suffer from major drawbacks: the lognormal (Gaussian) modeling of the returns does not take into account the observed fat tail distribution ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
In financial market risk measurement, ValueatRisk (VaR) techniques have proven to be a very useful and popular tool. Unfortunately, most VaR estimation models suffer from major drawbacks: the lognormal (Gaussian) modeling of the returns does not take into account the observed fat tail distribution and the nonstationarity of the financial instruments severely limits the efficiency of the VaR predictions. In this paper, we present a new approach to VaR estimation which is based on ideas from the field of information theory and lossless data compression. More specifically, the technique of context modeling is applied to estimate the VaR by conditioning the probability density function on the present context. Treestructured vector quantization is applied to partition the multidimensional state space of both macroeconomic and microeconomic priors into an increasing but limited number of context classes. Each class can be interpreted as a state of aggregation with its own statistical and dynamic behavior, or as a random walk with its own drift and step size. Results on the US S&P500 index, obtained using several evaluation methods, show the strong potential of this approach and prove that it can be applied successfully for, amongst other useful applications, VaR and volatility prediction. The October 1997 crash is indicated in time.
Noncrossing trees revisited: cutting down and spanning subtrees
 Discrete Mathematics and Theoretical Computer Science, Proceedings AC, 265–276
, 2003
"... Here we consider two parameters for random noncrossing trees: (i) the number of random cuts to destroy a sizen noncrossing tree and (ii) the spanning subtreesize of p randomly chosen nodes in a sizen noncrossing tree. For both quantities, we are able to characterise for n → ∞ the limiting dis ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
Here we consider two parameters for random noncrossing trees: (i) the number of random cuts to destroy a sizen noncrossing tree and (ii) the spanning subtreesize of p randomly chosen nodes in a sizen noncrossing tree. For both quantities, we are able to characterise for n → ∞ the limiting distributions. Noncrossing trees are almost conditioned GaltonWatson trees, and it has been already shown, that the contour and other usually associated discrete excursions converge, suitable normalised, to the Brownian excursion. We can interpret parameter (ii) as a functional of a conditioned random walk, and although we do not have such an interpretation for parameter (i), we obtain here limiting distributions, that are also arising as limits of some functionals of conditioned random walks. Contents
Vision and Motion Planning for a Mobile Robot under Uncertainty
 Int. J. of Robotics Research
, 1997
"... This paper describes a framework for vision and motion planning for a mobile robot. The task of the robot is to reach the destination in the minimum time while it detects possible routes by vision. Sincevisualrecognition is computationally expensive and the recognition result includes uncertainty, a ..."
Abstract

Cited by 11 (10 self)
 Add to MetaCart
This paper describes a framework for vision and motion planning for a mobile robot. The task of the robot is to reach the destination in the minimum time while it detects possible routes by vision. Sincevisualrecognition is computationally expensive and the recognition result includes uncertainty, a tradeoff must beconsideredbetween the cost of visual recognition and the effect of information to be obtainedbyrecognition. Using a probabilistic model of the uncertainty of the recognition result, visionmotion planning is formulatedasarecurrence formula. With this formulation, the optimal sequence of observation points is recursively determined. A generated plan is globally optimal because the planner minimizes the total cost. An efficient solution strategy is also described which employs a pruning methodbased on the lower bound of the total cost calculated by assuming perfect sensor information. Simulation results and experiments with an actual mobile robot demonstrate the feasibility of our approach.
Quantifying IT forecast quality
, 2008
"... In this paper, we showed how to quantify the quality of IT forecasts based on Boehm’s cone of uncertainty and DeMarco’s Estimating Quality Factor. With these, we support decision making by providing critical information on IT forecasting quality to IT governors. We illustrated that plotting forecast ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
In this paper, we showed how to quantify the quality of IT forecasts based on Boehm’s cone of uncertainty and DeMarco’s Estimating Quality Factor. With these, we support decision making by providing critical information on IT forecasting quality to IT governors. We illustrated that plotting forecast to actual ratios against a predefined referential conical shape reveals potential biases, for instance political, involved with IT forecasting. The Estimating Quality Factor quantifies the deviation of forecasts from their actual value. Using simulations, we showed that the conical shape of Boehm’s cone is not caused by improved estimation, but can also be found when estimation accuracy decreases. We illustrated our approach by applying it to four realworld case studies (1741 projects, 12187 forecasts, 1059 million Euro). Finally, we surveyed benchmarks related to forecasting and proposed new benchmarks based on our extensive data. Most forecasting benchmarks in the literature turned out to have an unknown bias. As a consequence, we argued that such figures including Standish’s project success benchmarks are meaningless.
Dynamic Reducts and Statistical Inference
 In: Proceedings of the Sixth International Conference, Information Procesing and Management of Uncertainty in Knowledge Based Systems (IPMU'96
, 1996
"... We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is often impossible to extract general laws from experimental data by computing first all reducts (Pawlak 1991) of a data table (decision table) and next decision rules from these reducts. We have devel ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We apply rough set methods and boolean reasoning for knowledge discovery from decision tables. It is often impossible to extract general laws from experimental data by computing first all reducts (Pawlak 1991) of a data table (decision table) and next decision rules from these reducts. We have developed an idea of dynamic reducts as a tool allowing to find relevant reducts for the decision rule generation (Bazan 1994a), (Bazan 1994b), (Bazan 1994c), (Nguyen 1993). Tests on several data tables are showing that the application of dynamic reducts leads to the increasing of the classification quality and/or decreasing of the size of decision rule sets. In this paper we present some statistical arguments showing that the introduced stability coefficients of dynamic reducts are proper measures of their quality. Key words: knowledge discovery, rough sets, decision algorithms, machine learning. 1 INTRODUCTION The aim of the paper is to present a method for extracting laws...
Neural Network for Estimating Conditional Distributions
 IEEE Trans. Neural Networks
, 1997
"... Neural networks for estimating conditional distributions and their associated quantiles are investigated in this paper. A basic network structure is developed on the basis of kernel estimation theory, and consistency is proved from a mild set of assumptions. A number of applications within statistic ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Neural networks for estimating conditional distributions and their associated quantiles are investigated in this paper. A basic network structure is developed on the basis of kernel estimation theory, and consistency is proved from a mild set of assumptions. A number of applications within statistics, decision theory and signal processing are suggested, and a numerical example illustrating the capabilities of the elaborated network is given. Keywords Neural Networks, Conditional Distributions, Kernel Estimation, Optimal Control, Data Transmission. I. Introduction Relationships between random variables are most often described by characteristic parameters such as mean vectors and covariance matrices or in extraordinary cases moments of higher order. When standard situations are considered, for example, if all variables are jointly Gaussian or when the conditional characteristics are linear or low degree polynomial functions, that approach is to be recommended. On the other hand when ...