Results 11  20
of
215
Comparing AnomalyDetection Algorithms for Keystroke Dynamics
"... Keystroke dynamics—the analysis of typing rhythms to discriminate among users—has been proposed for detecting impostors (i.e., both insiders and external attackers). Since many anomalydetection algorithms have been proposed for this task, it is natural to ask which are the top performers (e.g., to ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
(Show Context)
Keystroke dynamics—the analysis of typing rhythms to discriminate among users—has been proposed for detecting impostors (i.e., both insiders and external attackers). Since many anomalydetection algorithms have been proposed for this task, it is natural to ask which are the top performers (e.g., to identify promising research directions). Unfortunately, we cannot conduct a sound comparison of detectors using the results in the literature because evaluation conditions are inconsistent across studies. Our objective is to collect a keystrokedynamics data set, to develop a repeatable evaluation procedure, and to measure the performance of a range of detectors so that the results can be compared soundly. We collected data from 51 subjects typing 400 passwords each, and we implemented and evaluated 14 detectors from the keystrokedynamics and patternrecognition literature. The three topperforming detectors achieve equalerror rates between 9.6 % and 10.2%. The results—along with the shared data and evaluation methodology—constitute a benchmark for comparing detectors and measuring progress. 1.
Statistical Inference and Data Mining
, 1996
"... es of probability distributions, estimation, hypothesis testing, model scoring, Gibb's sampling, rational decision making, causal inference, prediction, and model averaging. For a rigorous survey of statistics, the mathematically inclined reader should see [7]. Due to space limitations ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
es of probability distributions, estimation, hypothesis testing, model scoring, Gibb's sampling, rational decision making, causal inference, prediction, and model averaging. For a rigorous survey of statistics, the mathematically inclined reader should see [7]. Due to space limitations, we must also ignore a number of interesting topics, including time series analysis and metaanalysis. Probability Distributions The statistical literature contains mathematical characterizations of a wealth of probability distributions, as well as properties of random variablesfunctions defined on the "events" to which a probability measure assigns values. Important relations among probability distributions include marginalization (summing over a subset of values) and conditionalization (forming a conditional probability measure from a probability measure on a sample space and some event of positive probability. Essential relations among random variable
Models for network evolution
 Journal of Mathematical Sociology
, 1996
"... Abstract: This paper describes mathematical models for network evolution when ties (edges) are directed and the node set is xed. Each of these models implies a speci c type of departure from the standard null binomial model. We provide statistical tests that, in keeping with these models, are sensit ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
(Show Context)
Abstract: This paper describes mathematical models for network evolution when ties (edges) are directed and the node set is xed. Each of these models implies a speci c type of departure from the standard null binomial model. We provide statistical tests that, in keeping with these models, are sensitive to particular types of departures from the null. Each model (and associated test) discussed follows directly from one or more sociocognitive theories about how individuals alter the colleagues with whom they are likely to interact. The models include triad completion models, degree variance models, polarization and balkanization models, the HollandLeinhardt models, metric models, and the constructural model. We nd that many of these models, in their basic form, tend asymptotically towards an equilibrium distribution centered at the completely connected network (i.e., all individuals are equally likely to interact with all other individuals) � a fact that can inhibit the development of satisfactory tests. Keywords: triad completion, HollandLeinhardt model, polarization, degree variance, network evolution, constructuralism
Overfitting Explained
, 1997
"... Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variabl ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Overfitting arises when model components are evaluated against the wrong reference distribution. Most modeling algorithms iteratively find the best of several components and then test whether this component is good enough to add to the model. We show that for independently distributed random variables, the reference distribution for any one variable underestimates the reference distribution for the the highestvalued variable # thus variate values will appear significant when they are not, and model components will be added when they should not be added. We relate this problem to the wellknown statistical theory of multiple comparisons or simultaneous inference.
Measures of agreement between computation and experiment: Validation metrics
, 2006
"... With the increasing role of computational modeling in engineering design, performance estimation, and safety assessment, improved methods are needed for comparing computational results and experimental measurements. Traditional methods of graphically comparing computational and experimental results, ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
With the increasing role of computational modeling in engineering design, performance estimation, and safety assessment, improved methods are needed for comparing computational results and experimental measurements. Traditional methods of graphically comparing computational and experimental results, though valuable, are essentially qualitative. Computable measures are needed that can quantitatively compare computational and experimental results over a range of input, or control, variables to sharpen assessment of computational accuracy. This type of measure has been recently referred to as a validation metric. We discuss various features that we believe should be incorporated in a validation metric, as well as features that we believe should be excluded. We develop a new validation metric that is based on the statistical concept of confidence intervals. Using this fundamental concept, we construct two specific metrics: one that requires interpolation of experimental data and one that requires regression (curve fitting) of experimental data. We apply the metrics to three example problems: thermal decomposition of a polyurethane foam, a turbulent buoyant plume of helium, and compressibility effects on the growth rate of a turbulent freeshear layer. We discuss how the present metrics are easily interpretable for assessing computational model accuracy, as well as the impact of experimental measurement uncertainty on the accuracy assessment.
Unionintersection and samplesplit methods in econometrics with applications to SURE and MA models
, 1998
"... article. In this paper, we develop inference procedures (tests and confidence sets) for two apparently distinct classes of situations: first, problems of comparing or pooling information from several samples whose stochastic relationship is not specified; second, problems where the distributions of ..."
Abstract

Cited by 23 (12 self)
 Add to MetaCart
article. In this paper, we develop inference procedures (tests and confidence sets) for two apparently distinct classes of situations: first, problems of comparing or pooling information from several samples whose stochastic relationship is not specified; second, problems where the distributions of standard test statistics are difficult to assess (e.g., because they involve unknown nuisance parameters), while it is possible to obtain more tractable distributional results for statistics based on appropriately chosen subsamples. A large number of econometric models lead to such situations, such as comparisons of regression equations when the relationship between the disturbances across equations is unknown or complicated: seemingly unrelated regression equations (SURE), regressions with moving average (MA) errors, etc. To deal with such problems, we propose a general approach which uses unionintersection techniques to combine tests (or confidence sets) based on different samples. In particular, we make a systematic use of BooleBonferroni inequalities to control the overall level of the procedure. This approach is easy to apply and transposable to a wide spectrum of models. In addition to being robust to various misspecifications of interest, the approach
Testing For Nonlinearity Using Redundancies: Quantitative and Qualitative Aspects
 Physica D
, 1995
"... A method for testing nonlinearity in time series is described based on informationtheoretic functionals  redundancies, linear and nonlinear forms of which allow either qualitative, or, after incorporating the surrogate data technique, quantitative evaluation of dynamical properties of scrutinized ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
(Show Context)
A method for testing nonlinearity in time series is described based on informationtheoretic functionals  redundancies, linear and nonlinear forms of which allow either qualitative, or, after incorporating the surrogate data technique, quantitative evaluation of dynamical properties of scrutinized data. An interplay of quantitative and qualitative testing on both the linear and nonlinear levels is analyzed and robustness of this combined approach against spurious nonlinearity detection is demonstrated. Evaluation of redundancies and redundancybased statistics as functions of time lag and embedding dimension can further enhance insight into dynamics of a system under study. Keywords: time series, nonlinearity, mutual information, redundancy, surrogate data 1 Introduction The problem of inferring the dynamics of a system from measured data is a perpetual challenge for time series analysts. Ideas and concepts from nonlinear dynamics and theory of deterministic chaos have led to a num...
Visual recognition in monkeys following rhinal cortical ablations combined with either amygdalectomy or hippocampectomy
 J. Neurosci
, 1986
"... Performance on a visual recognition task was assessed in cynomolgus monkeys with ablations of rhinal (i.e., ento, pro, and perirhinal) cortex in combination with either amygdalectomy or hippocampectomy, as well as in unoperated controls. Removal of the hippocampal formation plus rhinal cortex resu ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
Performance on a visual recognition task was assessed in cynomolgus monkeys with ablations of rhinal (i.e., ento, pro, and perirhinal) cortex in combination with either amygdalectomy or hippocampectomy, as well as in unoperated controls. Removal of the hippocampal formation plus rhinal cortex resulted in a mild recognition deficit, whereas removal of the amygdaloid complex plus rhinal cortex resulted in a severe deficit. Comparison of the results with those of an earlier study (Mishkin, 1978) indicates that adding a rhinal cortical removal to hippocampectomy yields little, if any, additional impairment in recognition. By confrasf, adding a rhinal cortical removal to an amygdalectomy has a profound effect; indeed, the recognition impairment in monkeys with amygdaloid plus rhinal removals was at least as severe as that seen in monkeys with combined amygdaloid and hippocampal removals. Taken together, these results support
Optimized stratified sampling for approximate query processing
 ACM TODS
, 2007
"... The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous samplingbased studies, we treat the problem as an optimization problem where, given a workload of queries, we select a stratifie ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous samplingbased studies, we treat the problem as an optimization problem where, given a workload of queries, we select a stratified random sample of the original data such that the error in answering the workload queries using the sample is minimized. A key novelty of our approach is that we can tailor the choice of samples to be robust even for workloads that are “similar” but not necessarily identical to the given workload. Finally, our techniques recognize the importance of taking into account the variance in the data distribution in a principled manner. We show how our solution can be implemented on a database system, and present results of extensive experiments on Microsoft SQL Server that demonstrate the superior quality of our method compared to previous work.
Environmental scanning: Acquisition and use of information by managers
 In M. E. Williams (Ed.), Annual review of information science and technology (vol.28
, 1993
"... The present study investigates how chief executive officers in the Canadian telecommunications industry acquire and use information about the external business environment, an information seeking activity known as environmental scanning. Data were collected by a nationwide questionnaire survey and s ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
The present study investigates how chief executive officers in the Canadian telecommunications industry acquire and use information about the external business environment, an information seeking activity known as environmental scanning. Data were collected by a nationwide questionnaire survey and several focused interviews. Of the 113 CEOs in the study population, 67 returned completed questionnaires, thus giving a response rate of 59 percent. Personal interviews were then conducted with eight of the respondents. The chief executives collectively perceive the Technological, Customer, and Competition environmental sectors to have the greatest Perceived Strategic Uncertainty – these sectors were perceived to be the most strategic, variable and complex. For each environmental sector, the Amount of Scanning of the sector is positively correlated with the Perceived Strategic Uncertainty of that sector. Generally, the chief executives use multiple, complementary sources in environmental scanning. Personal sources such as customers and subordinate staff are very important in both scanning and decision making, and they are used more frequently than impersonal sources. Nonetheless, impersonal sources such as publications and reports are also frequently used in scanning. In decision making, environmental information from internal sources is used more frequently than that from external sources. For many of the information sources, the frequency of source use is