Results 1  10
of
451
PrivacyPreserving Data Mining
, 2000
"... A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models with ..."
Cited by 608 (3 self)
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decisiontree classifier from tredning data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose anovel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Perspectives on system identification
 In Plenary talk at the proceedings of the 17th IFAC World Congress, Seoul, South Korea
, 2008
"... System identification is the art and science of building mathematical models of dynamic systems from observed inputoutput data. It can be seen as the interface between the real world of applications and the mathematical world of control theory and model abstractions. As such, it is an ubiquitous ne ..."
Cited by 77 (2 self)
System identification is the art and science of building mathematical models of dynamic systems from observed inputoutput data. It can be seen as the interface between the real world of applications and the mathematical world of control theory and model abstractions. As such, it is an ubiquitous necessity for successful applications. System identification is a very large topic, with different techniques that depend on the character of the models to be estimated: linear, nonlinear, hybrid, nonparametric etc. At the same time, the area can be characterized by a small number of leading principles, e.g. to look for sustainable descriptions by proper decisions in the triangle of model complexity, information contents in the data, and effective validation. The area has many facets and there are many approaches and methods. A tutorial or a survey in a few pages is not quite possible. Instead, this presentation aims at giving an overview of the “science ” side, i.e. basic principles and results and at pointing to open problem areas in the practical, “art”, side of how to approach and solve a real problem. 1.
Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells
 J. Neumphysiol
, 1998
"... such as the orientation of a line in the visual field or the location of Two main goals for reconstruction are approached in this the body in space are coded as activity levels in populations of neurons. Reconstruction or decoding is an inverse problem in which paper. The first goal is technical and ..."
Cited by 75 (6 self)
such as the orientation of a line in the visual field or the location of Two main goals for reconstruction are approached in this the body in space are coded as activity levels in populations of neurons. Reconstruction or decoding is an inverse problem in which paper. The first goal is technical and is exemplified by the the physical variables are estimated from observed neural activity. population vector method applied to motor cortical activities Reconstruction is useful first in quantifying how much information during various reaching tasks (Georgopoulos et al. 1986, 1989; about the physical variables is present in the population and, second, Schwartz 1994) and the template matching method applied to in providing insight into how the brain might use distributed represen disparity selective cells in the visual cortex (Lehky and Sejnowtations in solving related computational problems such as visual ob ski 1990) and hippocampal place cells during rapid learning of ject recognition and spatial navigation. Two classes of reconstruction place fields in a novel environment (Wilson and McNaughton methods, namely, probabilistic or Bayesian methods and basis func 1993). In these examples, reconstruction extracts information tion methods, are discussed. They include important existing methods from noisy neuronal population activity and transforms it to a
Approximate NView Stereo
 in Proc. European Conf. on Computer Vision
, 2000
"... . This paper introduces a new multiview reconstruction problem called approximate Nview stereo. The goal of this problem is to recover a oneparameter family of volumes that are increasingly tighter supersets of an unknown, arbitrarilyshaped 3D scene. By studying 3D shapes that reproduce the in ..."
Cited by 65 (4 self)
. This paper introduces a new multiview reconstruction problem called approximate Nview stereo. The goal of this problem is to recover a oneparameter family of volumes that are increasingly tighter supersets of an unknown, arbitrarilyshaped 3D scene. By studying 3D shapes that reproduce the input photographs up to a special image transformation called a shuffle transformation,we prove that (1) these shapes can be organized hierarchically into nested supersets of the scene, and (2) they can be computed using a simple algorithm called Approximate Space Carving that is provablycorrect for arbitrary discrete scenes (i.e., for unknown, arbitrarilyshaped Lambertian scenes that are defined by a finite set of voxels and are viewed from N arbitrarilydistributed viewpoints inside or around them). The approach is specifically designed to attack practical reconstruction problems, including (1) recovering shape from images with inaccurate calibration information, and (2) building ...
The Surprise Element: Jumps in Interest Rates
 Journal of Econometrics
, 2002
"... Abstract. That information surprises result in discontinuous interest rates is no surprise to participants in the bond markets. We develop a class of PoissonGaussian models of the Fed Funds rate to capture surprise effects, and show that these models offer a good statistical description of short ra ..."
Cited by 61 (2 self)
Abstract. That information surprises result in discontinuous interest rates is no surprise to participants in the bond markets. We develop a class of PoissonGaussian models of the Fed Funds rate to capture surprise effects, and show that these models offer a good statistical description of short rate behavior, and are useful in understanding many empirical phenomena. Estimators are used based on analytical derivations of the characteristic functions and moments of jumpdiffusion stochastic processes for a range of jump distributions, and are extended to discretetime models. Jump (Poisson) processes capture empirical features of the data which would not be captured by Gaussian models, and there is strong evidence that existing models would be wellenhanced by jump and ARCHtype processes. The analytical and empirical methods in the paper support many applications, such as testing for Fed intervention effects, which are shown to be an important source of surprise jumps in interest rates. The jump model is shown to mitigate the nonlinearity of interest rate drifts, so prevalent in purediffusion models. Dayofweek effects are modelled explicitly, and the jump model provides evidence of bond market overreaction, rejecting the martingale hypothesis for interest rates. Jump models mixed with Markov switching processes predicate that conditioning on regime is important in determining short rate behavior.
A Plane Measuring Device
, 1997
"... A requirement of a visual measurement device is that both measurements and their uncertainties can be determined. This paper develops an uncertainty analysis which includes both the errors in image localization and the uncertainty in the imaging transformation. The matrix representing the imaging tr ..."
Cited by 55 (4 self)
A requirement of a visual measurement device is that both measurements and their uncertainties can be determined. This paper develops an uncertainty analysis which includes both the errors in image localization and the uncertainty in the imaging transformation. The matrix representing the imaging transformation is estimated from imagetoworld point correspondences. A general expression is derived for the covariance of this matrix. This expression is valid if the matrix is over determined and also if the minimum number of correspondences are used. A bound on the errors of the first order approximations involved is also derived. Armed with this covariance result the uncertainty of any measurement can be predicted, and furthermore the distribution of correspondences can be chosen to achieve a particular bound on the uncertainty. Examples are given of measurements such as distance and parallelism for several applications. These include indoor scenes and architectural measurements. Key word...
Active Learning in Multilayer Perceptrons
, 1996
"... We propose an active learning method with hiddenunit reduction, which is devised specially for multilayer perceptrons (MLP). First, we review our active learning method, and point out that many Fisherinformationbased methods applied to MLP have a critical problem: the information matrix may be si ..."
Cited by 47 (0 self)
We propose an active learning method with hiddenunit reduction, which is devised specially for multilayer perceptrons (MLP). First, we review our active learning method, and point out that many Fisherinformationbased methods applied to MLP have a critical problem: the information matrix may be singular. To solve this problem, we derive the singularity condition of an information matrix, and propose an active learning technique that is applicable to MLP. Its effectiveness is verified through experiments. 1 INTRODUCTION When one trains a learning machine using a set of data given by the true system, its ability can be improved if one selects the training data actively. In this paper, we consider the problem of active learning in multilayer perceptrons (MLP). First, we review our method of active learning (Fukumizu el al., 1994), in which we prepare a probability distribution and obtain training data as samples from the distribution. This methodology leads us to an informationmatrix...
Algebraic analysis for nonidentifiable learning machines
 Neural Computation
"... This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously pr ..."
Cited by 46 (14 self)
This paper clarifies the relation between the learning curve and the algebraic geometrical structure of a nonidentifiable learning machine such as a multilayer neural network whose true parameter set is an analytic set with singular points. By using a concept in algebraic analysis, we rigorously prove that the Bayesian stochastic complexity or the free energy is asymptotically equal to λ1 log n − (m1 − 1) log log n+constant, where n is the number of training samples and λ1 and m1 are the rational number and the natural number which are determined as the birational invariant values of the singularities in the parameter space. Also we show an algorithm to calculate λ1 and m1 based on the resolution of singularities in algebraic geometry. In regular statistical models, 2λ1 is equal to the number of parameters and m1 = 1, whereas in nonregular models such as multilayer networks, 2λ1 is not larger than the number of parameters and m1 ≥ 1. Since the increase of the stochastic complexity is equal to the learning curve or the generalization error, the nonidentifiable learning machines are the better models than the regular ones if the Bayesian ensemble learning is applied. 1 1
ON METHODS OF SIEVES AND PENALIZATION
, 1997
"... We develop a general theory which provides a unified treatment for the asymptotic normality and efficiency of the maximum likelihood estimates (MLE’s) in parametric, semiparametric and nonparametric models. We find that the asymptotic behavior of substitution estimates for estimating smooth function ..."
Cited by 44 (1 self)
We develop a general theory which provides a unified treatment for the asymptotic normality and efficiency of the maximum likelihood estimates (MLE’s) in parametric, semiparametric and nonparametric models. We find that the asymptotic behavior of substitution estimates for estimating smooth functionals are essentially governed by two indices: the degree of smoothness of the functional and the local size of the underlying parameter space. We show that when the local size of the parameter space is not very large, the substitution standard (nonsieve), substitution sieve and substitution penalized MLE’s are asymptotically efficient in the Fisher sense, under certain stochastic equicontinuity conditions of the loglikelihood. Moreover, when the convergence rate of the estimate is slow, the degree of smoothness of the functional needs to compensate for the slowness of the rate in order to achieve efficiency. When the size of the parameter space is very large, the standard and penalized maximum likelihood procedures may be inefficient, whereas the method of sieves may be able to overcome this difficulty. This phenomenon is particularly manifested when the functional of interest is very smooth, especially in the semiparametric case.