Results 1  10
of
10
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract

Cited by 116 (14 self)
 Add to MetaCart
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
ADMIT: Anomalybased Data Mining for Intrusions
"... Security of computer systems is essential to their acceptance and utility. Computer security analysts use intrusion detection systems to assist them in maintaining computer system security. This paper deals with the problem of differentiating between masqueraders and the true user of a computer term ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Security of computer systems is essential to their acceptance and utility. Computer security analysts use intrusion detection systems to assist them in maintaining computer system security. This paper deals with the problem of differentiating between masqueraders and the true user of a computer terminal. Prior efficient solutions are less suited to real time application, often requiring all training data to be labeled, and do not inherently provide an intuitive idea of what the data model means. Our system, called ADMIT, relaxes these constraints, by creating user profiles using semiincremental techniques. It is a realtime intrusion detection system with hostbased data collection and processing. Our method also suggests ideas for dealing with concept drift and affords a detection rate as high as 80.3% and a false positive rate as low as 15.3%.
Spectral Imaging Target Development Based on Hierarchical Cluster Analysis
 in Proceedings of the 12 th Color Imaging Conference
, 2004
"... Agglomerative hierarchical cluster analysis was used to group similar spectra from a large database of samples. Based on angles between reflectance vectors of members of a cluster, a reflectance vector was selected as representative of that cluster. Representative samples were grouped together and s ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Agglomerative hierarchical cluster analysis was used to group similar spectra from a large database of samples. Based on angles between reflectance vectors of members of a cluster, a reflectance vector was selected as representative of that cluster. Representative samples were grouped together and stored as new calibration targets. Simulated wideband imaging with glass filters was performed using these new calibration targets and a transformation matrix from digital signals to reflectance was derived. Different verification targets were reconstructed using the transformation matrix; the spectral and colorimetric accuracy of the reconstruction was evaluated. It was shown that beyond a threshold number of samples in the calibration target, the performance of reconstruction became independent of the number of samples used in the calculation. The average spectral RMS for a calibration target consisting of 24 samples selected based on clustering were found to be less than 3.2 % for
Ranking and Selecting Clustering Algorithms Using a MetaLearning Approach
, 2008
"... We present a novel framework that applies a metalearning approach to clustering algorithms. Given a dataset, our metalearning approach provides a ranking for the candidate algorithms that could be used with that dataset. This ranking could, among other things, support nonexpert users in the algori ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We present a novel framework that applies a metalearning approach to clustering algorithms. Given a dataset, our metalearning approach provides a ranking for the candidate algorithms that could be used with that dataset. This ranking could, among other things, support nonexpert users in the algorithm selection task. In order to evaluate the framework proposed, we implement a prototype that employs regression support vector machines as the metalearner. Our case study is developed in the context of cancer gene expression microarray datasets.
Testing Homogeneity in a Mixture Distribution via the L² Distance Between Competing Models
 Journal of the American Statistical Society
, 2004
"... Ascertaining the number of components in a mixture distribution is an interesting and challenging problem for statisticians. Chen, Chen, and Kalbeisch (2001) recently proposed a modified likelihood ratio test (MLRT), which is distributionfree and locally most powerful, asymptotically. In this paper ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Ascertaining the number of components in a mixture distribution is an interesting and challenging problem for statisticians. Chen, Chen, and Kalbeisch (2001) recently proposed a modified likelihood ratio test (MLRT), which is distributionfree and locally most powerful, asymptotically. In this paper we present a new method for testing whether a finite mixture distribution is homogeneous. Our method, the Dtest, is based on the L² distance between a fitted homogeneous model and a fitted heterogeneous model. For mixture components from standard distributions, our Dtest statistic has closedform expressions in terms of parameter estimates, whereas likelihood ratiotype test statistics do not. Thus, our test has potential for data mining applications. The convergence rate of the Dtest statistic under a null hypothesis of homogeneity is established. The Dtest is shown to be competitive with the MLRT when the mixture components are normal. The MLRT performs better for small sample sizes when the mixture components are exponential, but in this case there is little visual separation and, hence, little L² separation between the homogeneous and heterogeneous models. Thus, we propose that the measure underlying the L² be modified according to a suitable weight function, which is equivalent to transforming the data before applying the Dtest. Such a modification produces a generalized Dtest that is competitive in the aforementioned case. After applying our method to a data set in which the observations are measurements of firms' financial performances, we conclude with discussion and remarks.
Preface Preface
, 2006
"... Effective prediction of highway travel time is essential to many advanced traveler information and transportation management system. This thesis proposes 3 different prediction schemes to predict highway travel time in certain stretch of Denmark, using a linear model where the coefficients vary as s ..."
Abstract
 Add to MetaCart
Effective prediction of highway travel time is essential to many advanced traveler information and transportation management system. This thesis proposes 3 different prediction schemes to predict highway travel time in certain stretch of Denmark, using a linear model where the coefficients vary as smooth functions of the departure time, also the principle components and partial least squares regression. The methods are straightforward to implement and applicable to different circumstances.
A Concurrent ObjectOriented Approach to the Eigenproblem Treatment in Shared Memory Multicore Environments
"... Abstract. This work presents an objectoriented approach to the concurrent computation of eigenvalues and eigenvectors in real symmetric and Hermitian matrices on present memory shared multicore systems. This can be considered the lower level step in a general framework for dealing with large size e ..."
Abstract
 Add to MetaCart
Abstract. This work presents an objectoriented approach to the concurrent computation of eigenvalues and eigenvectors in real symmetric and Hermitian matrices on present memory shared multicore systems. This can be considered the lower level step in a general framework for dealing with large size eigenproblems, where the matrices are factorized to a small enough size. The results show that the proposed parallelization achieves a good speedup in actual systems with up to four cores. Also, it is observed that the limiting performance factor is the number of threads rather than the size of the matrix. We also find that a reasonable upper limit for a “small ” dense matrix to be treated in actual processors is in the interval 1000030000.
THIS
"... Abstract—This paper presents a framework for weighted fusion of several Active Shape and Active Appearance Models. The approach is based on the eigenspace fusion method proposed by Hall et al. [1], which has been extended to fuse more than two weighted eigenspaces using unbiased mean and covariance ..."
Abstract
 Add to MetaCart
Abstract—This paper presents a framework for weighted fusion of several Active Shape and Active Appearance Models. The approach is based on the eigenspace fusion method proposed by Hall et al. [1], which has been extended to fuse more than two weighted eigenspaces using unbiased mean and covariance matrix estimates. To evaluate the performance of fusion, a comparative assessment on segmentation precision as well as facial verification tests are performed using the AR, EQUINOX, and XM2VTS databases. Based on the results, it is concluded that the fusion is useful when the model needs to be updated online or when the original observations are absent. Index Terms—AAM, ASM, model fusion, statistical model, segmentation. Ç
And Regression Analysis
"... In this paper, we consider a large class of computational problems in robust statistics, which can be formulated as selection of optimal subsets of data based on some criterion function. To solve such problems, there are largely two classes of algorithms available in the literature. One is based on ..."
Abstract
 Add to MetaCart
In this paper, we consider a large class of computational problems in robust statistics, which can be formulated as selection of optimal subsets of data based on some criterion function. To solve such problems, there are largely two classes of algorithms available in the literature. One is based on purely random search, and the other is based on deterministically guided strategies. Though these methods can achieve satisfactory results in some specific examples, none of them can be used satisfactorily for a large class of similar problems either due to their very long expected waiting time to hit the true optimum or due to their failure to come out of a local optimum when they get trapped there. Here, we propose two probabilistic search algorithms, and under some conditions on the parameters of the algorithms, we establish the convergence of our algorithms to the true optimum. We also show some results on the probability of hitting the true optimum if the algorithms are run for a finite number of iterations. Finally, we compare the performance of our algorithms to some commonly available algorithms for computing some popular robust multivariate statistics using real data sets.
Multiobjective Genetic Algorithm Optimization of a Neural Network for Estimating Wind Speed Prediction Intervals
, 2013
"... Abstract — In this work, the nondominated sorting genetic algorithm–II (NSGAII) is applied to determine the weights of a neural network trained for shortterm forecasting of wind speed. More precisely, the neural network is trained to produce the lower and upper bounds of the prediction intervals ..."
Abstract
 Add to MetaCart
Abstract — In this work, the nondominated sorting genetic algorithm–II (NSGAII) is applied to determine the weights of a neural network trained for shortterm forecasting of wind speed. More precisely, the neural network is trained to produce the lower and upper bounds of the prediction intervals of wind speed. The objectives driving the search for the optimal values of the neural network weights are the coverage of the prediction intervals (to be maximized) and the width (to be minimized). A real application is shown with reference to hourly wind speed, temperature, relative humidity and pressure data in the region of Regina, Saskatchewan, Canada. Correlation analysis shows that the wind speed has weak dependence on the above mentioned meteorological parameters; hence, only hourly historical wind speed is used as input to a neural network model trained to provide in output the onehourahead prediction of wind speed. The originality of the work lies in proposing a multiobjective framework for estimating wind speed prediction intervals (PIs), optimal both in terms of accuracy (coverage probability) and efficacy (width). In the case study analyzed, a comparison with two singleobjective methods has been done and the results show that the PIs produced by NSGAII compare well with those and are satisfactory in both objectives of high coverage and small width.