Results 1 - 10
of
33
Synthesis of Wiring Signature-Invariant Equivalence Class Circuit Mutants and Applications to Benchmarking
, 1998
"... This paper formalizes the synthesis process of wiring signatur e-invariant (WSI) combinational circuit mutants. The signature 0 is defined by a reference circuit 0, which itself is modeled as a canonic alform of a directed bipartite graph. A wiring perturbation induces a perturbed reference circuit ..."
Abstract
-
Cited by 27 (16 self)
- Add to MetaCart
This paper formalizes the synthesis process of wiring signatur e-invariant (WSI) combinational circuit mutants. The signature 0 is defined by a reference circuit 0, which itself is modeled as a canonic alform of a directed bipartite graph. A wiring perturbation induces a perturbed reference circuit. A number of mutant circuits i can be resynthesized from the perturbed circuit. The mutants of interest are the ones that belong to the wiring-signature invariant equivalenc e classN 0, i.e. the mutants i 2N 0. Cir cuit mutants i 2N 0have a number of useful properties. For any wiring perturbation, the size of the wiring signature-invariant equivalence class is huge. Notably, circuits in this class are not random, although for un biased testing and benchmarking purp oses, mutant selections from this class are typically random. For each reference circuit, we synthesized eight equivalence subclasses of circuit mutants, based on 0 to 100 % perturbation. Each subclass contains 100 randomly chosen mutant circuits, each listed in a different random order. The 14,400 benchmarking experiments with 3200 mutants in 4 equivalence classes, covering 13 typical EDA algorithms, demonstrate that an unbiased random selection of such circuits can lead to statistically meaningful differentiation and improvements of existing and new algorithms.
Design of Experiments to Evaluate CAD Algorithms: Which Improvements Are Due to Improved Heuristic and Which Are Merely Due to Chance?
, 1998
"... ..."
The Masking Breakdown Point of Multivariate Outlier Identification Rules
- J. Americ. Statist. Assoc
, 1997
"... In this paper, we consider one-step outlier identification rules for multivariate data, generalizing the concept of so-called ff outlier identifiers, as presented in Davies and Gather (1993) for the case of univariate samples. We investigate, how the finite-sample breakdown points of estimators used ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
In this paper, we consider one-step outlier identification rules for multivariate data, generalizing the concept of so-called ff outlier identifiers, as presented in Davies and Gather (1993) for the case of univariate samples. We investigate, how the finite-sample breakdown points of estimators used in these identification rules influence the masking behaviour of the rules. Keywords: Breakdown points; Outlier identification; Masking; Robust statistics. 1 Introduction It is well known that outliers, i.e. observations lying "far away" from the main part of a data set and probably not following the assumed model, can strongly influence the statistical analysis of that data and even falsify the results. In particular, some classical parametric tests and estimators, e.g. the arithmetic mean as a location estimate, are prone to the influence of outlying observations. Therefore, one often finds the identification of outliers treated as a means to screen a data set for `bad' observations fir...
A method for simultaneous variable selection and outlier identification in linear regression
- COMPUTATIONAL STATISTICS & DATA ANALYSIS
, 1996
"... ..."
Y-means: A Clustering Method for Intrusion Detection
- Proceedings of Canadian Conference on Electrical and Computer Engineering
, 2003
"... As the Internet spreads to each corner of the world, computers are exposed to miscellaneous intrusions from the World Wide Web. We need effective intrusion detection systems to protect our computers from these unauthorized or malicious actions. Traditional instance-based learning methods for Intrusi ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
As the Internet spreads to each corner of the world, computers are exposed to miscellaneous intrusions from the World Wide Web. We need effective intrusion detection systems to protect our computers from these unauthorized or malicious actions. Traditional instance-based learning methods for Intrusion Detection can only detect known intrusions since these methods classify instances based on what they have learned. They rarely detect the intrusions that they have not learned before. In this paper, we present a clustering heuristic for intrusion detection, called Y-means. This proposed heuristic is based on the K-means algorithm and other related clustering algorithms. It overcomes two shortcomings of K-means: number of clusters dependency and degeneracy. The result of simulations run on the KDD-99 data set shows that Y-means is an effective method for partitioning large data space. A detection rate of 89.89 % and a false alarm rate of 1.00 % are achieved with Y-means. 1 Keywords: Clustering; intrusion detection; K-means;
Statistical Validation of Engineering and Scientific Models: Background
, 1999
"... A tutorial is presented discussing the basic issues associated with propagation of uncertainty analysis and statistical validation of engineering and scientific models. The propagation of uncertainty tutorial illustrates the use of the sensitivity method and the Monte Carlo method to evaluate the un ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
A tutorial is presented discussing the basic issues associated with propagation of uncertainty analysis and statistical validation of engineering and scientific models. The propagation of uncertainty tutorial illustrates the use of the sensitivity method and the Monte Carlo method to evaluate the uncertainty in predictions for linear and nonlinear models. Four example applications are presented; a linear model, a model for the behavior of a damped spring-mass system, a transient thermal conduction model, and a nonlinear transient convective-diffusive model based on Burger’s equation. Correlated and uncorrelated model input parameters are considered. The model validation tutorial builds on the material presented in the propagation of uncertainty tutorial and uses the damp spring-mass system as the example application. The validation tutorial illustrates several concepts associated with the application of statistical inference to test model predictions against experimental observations. Several validation methods are presented including error band based, multivariate, sum of squares of residuals, and optimization methods. After completion of the tutorial, a survey of statistical model validation literature is presented and recommendations for future work are made.
Design of Experiments for Evaluation of BDD Packages Using Controlled Circuit Mutations
, 1998
"... . Despite more than a decade of experience with the use of standardized benchmark circuits, meaningful comparisons of EDA algorithms remain elusive. In this paper, we introduce a new methodology for characterizing the performance of Binary Decision Diagram (BDD) algorithms. Our method involves the s ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
. Despite more than a decade of experience with the use of standardized benchmark circuits, meaningful comparisons of EDA algorithms remain elusive. In this paper, we introduce a new methodology for characterizing the performance of Binary Decision Diagram (BDD) algorithms. Our method involves the synthesis of large equivalence classes of functionally perturbed circuits, based on a known reference circuit. We demonstrate that such classes induce controllable distributions of BDD algorithm performance, which provide the foundation for statistically significant comparison of different algorithms. 1 Introduction We introduce methods rooted in the Design of Experiments, first formalized in [1], to evaluate the properties of programs which implement Reduced, Ordered Binary Decision Diagrams [2] (hereafter referred to as BDDs). The choice of BDD variable order has a profound impact on the size of the BDD data structure. Determining an optimal variable ordering is an NP-hard problem upon whi...
Efficient exact p-value computation for small sample, sparse, and surprising categorical data
- J. of Comp. Bio
, 2004
"... A major obstacle in applying various hypothesis testing procedures to datasets in bioinformatics is the computation of ensuing p-values. In this paper, we define a generic branchand-bound approach to efficient exact p-value computation and enumerate the required conditions for successful application ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
A major obstacle in applying various hypothesis testing procedures to datasets in bioinformatics is the computation of ensuing p-values. In this paper, we define a generic branchand-bound approach to efficient exact p-value computation and enumerate the required conditions for successful application. Explicit procedures are developed for the entire Cressie–Read family of statistics, which includes the widely used Pearson and likelihood ratio statistics in a one-way frequency table goodness-of-fit test. This new formulation constitutes a first practical exact improvement over the exhaustive enumeration performed by existing statistical software. The general techniques we develop to exploit the convexity of many statistics are also shown to carry over to contingency table tests, suggesting that they are readily extendible to other tests and test statistics of interest. Our empirical results demonstrate a speed-up of orders of magnitude over the exhaustive computation, significantly extending the practical range for performing exact tests. We also show that the relative speed-up gain increases as the null hypothesis becomes sparser, that computation precision increases with increase in speed-up, and that computation time is very moderately affected by the magnitude of the computed p-value. These qualities make our algorithm especially appealing in the regimes of small samples, sparse null distributions, and rare events, compared to the alternative asymptotic approximations and Monte Carlo samplers. We discuss several established bioinformatics applications, where small sample size, small expected counts in one or more categories (sparseness), and very small p-values do occur. Our computational framework could be applied in these, and similar cases, to improve performance. Key words: p-value, exact tests, branch and bound, real extension, categorical data.
Cumulative Advantage and Success-Breeds-Success: The Value of Time Pattern Analysis
- Journal of the American Society for Information Science, XLIX
, 1998
"... Many different theoretical models can be made to fit model implies that the shape of the distribution of producempirical informetric data. For the case of the distribution of papers across authors, the Success-Breeds-Success or Cumulative Advantage model is a popular candi-date. This article shows t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Many different theoretical models can be made to fit model implies that the shape of the distribution of producempirical informetric data. For the case of the distribution of papers across authors, the Success-Breeds-Success or Cumulative Advantage model is a popular candi-date. This article shows that examination of the time tion across a population arises from the distribution of production within each individual’s career. It is this application which will be investigated in this article. In the pattern of production allows independent evaluation of general case, the Simon-Yule framework is expressed as the component processes that generate the distribution sources which produce items. For clarity, we will express of papers across authors. Specifically for inventors, the it as authors who produce publications in keeping with Cumulative Advantage model for increasing rate of prothe application chosen here. duction with experience is not confirmed. Furthermore, the distribution of individual production is Poisson and Burrell and Fenton (1993) clearly described the two the distribution of the rate of production across the pop- component processes that must be present to generate data ulation fits the Gamma distribution. Thus, the non-uniform giftedness model is more appropriate for inventors. that fits the distribution of publications across authors. Readers who are unfamiliar with the typical distribution of publications across authors may wish to examine Fig-
A Comparison of Marginal Likelihood Computation Methods
- Discussion Paper 2002-084/4, Tinbergen Institute, Faculty of Economics and Business Administration, Vrije Unversiteit
, 2002
"... In a Bayesian analysis, different models can be compared on the basis of the expected or marginal likelihood they attain. Many methods have been devised to compute the marginal likelihood, but simplicity is not the strongest point of most methods. At the same time, the precision of methods is often ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In a Bayesian analysis, different models can be compared on the basis of the expected or marginal likelihood they attain. Many methods have been devised to compute the marginal likelihood, but simplicity is not the strongest point of most methods. At the same time, the precision of methods is often questionable.

