Results 11 - 20
of
1,354
Measuring Link Bandwidths Using a Deterministic Model of Packet Delay
- in Proceedings of ACM SIGCOMM
, 2000
"... We describe a deterministic model of packet delay and use it to derive both the packet pair [2] property of FIFO-queueing networks and a new technique (packet tailgating) for actively measuring link bandwidths. Compared to previously known techniques, packet tailgating usually consumes less network ..."
Abstract
-
Cited by 170 (3 self)
- Add to MetaCart
We describe a deterministic model of packet delay and use it to derive both the packet pair [2] property of FIFO-queueing networks and a new technique (packet tailgating) for actively measuring link bandwidths. Compared to previously known techniques, packet tailgating usually consumes less network bandwidth, does not rely on consistent behavior of routers handling ICMP packets, and does not rely on timely delivery of acknowledgments. Preliminary empirical measurements in the Internet indicate that compared to current measurement tools, packet tailgating sends an order of magnitude fewer packets, while maintaining approximately the same accuracy. Unfortunately, for all currently available measurement tools, including our prototype implementation of packet tailgating, accuracy is low for paths longer than a few hops. 1. INTRODUCTION As long as Internet bandwidth has increased, the amount of trac sent over the Internet has grown to consume it. This means that despite the increasing li...
Popular ensemble methods: an empirical study
- Journal of Artificial Intelligence Research
, 1999
"... An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Baggi ..."
Abstract
-
Cited by 151 (3 self)
- Add to MetaCart
An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund & Schapire, 1996; Schapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier – especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble’s performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees. 1.
Bias plus variance decomposition for zero-one loss functions
- In Machine Learning: Proceedings of the Thirteenth International Conference
, 1996
"... We present a bias-variance decomposition of expected misclassi cation rate, the most commonly used loss function in supervised classi cation learning. The bias-variance decomposition for quadratic loss functions is well known and serves as an important tool for analyzing learning algorithms, yet no ..."
Abstract
-
Cited by 144 (3 self)
- Add to MetaCart
We present a bias-variance decomposition of expected misclassi cation rate, the most commonly used loss function in supervised classi cation learning. The bias-variance decomposition for quadratic loss functions is well known and serves as an important tool for analyzing learning algorithms, yet no decomposition was o ered for the more commonly used zero-one (misclassi cation) loss functions until the recent work of Kong & Dietterich (1995) and Breiman (1996). Their decomposition su ers from some major shortcomings though (e.g., potentially negative variance), which our decomposition avoids. We show that, in practice, the naive frequency-based estimation of the decomposition terms is by itself biased and show how to correct for this bias. We illustrate the decomposition on various algorithms and datasets from the UCI repository. 1
Locality Preserving Projections
, 2002
"... Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data s ..."
Abstract
-
Cited by 142 (15 self)
- Add to MetaCart
Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA) -- a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.
Predicting fault incidence using software change history
- IEEE Transactions on Software Engineering
, 2000
"... AbstractÐThis paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using ..."
Abstract
-
Cited by 134 (2 self)
- Add to MetaCart
AbstractÐThis paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight. Index TermsÐFault potential, code decay, change management data, metrics, statistical analysis, generalized linear models. 1
Statistical Significance Tests for Machine Translation Evaluation
, 2004
"... If two translation systems differ differ in performance on a test set, can we trust that this indicates a difference in true system quality? To answer this question, we describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete exampl ..."
Abstract
-
Cited by 102 (0 self)
- Add to MetaCart
If two translation systems differ differ in performance on a test set, can we trust that this indicates a difference in true system quality? To answer this question, we describe bootstrap resampling methods to compute statistical significance of test results, and validate them on the concrete example of the BLEU score. Even for small test sizes of only 300 sentences, our methods may give us assurances that test result differences are real.
Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation
- in Proceedings of the 2000 DARPA Information Survivability Conference and Exposition
, 2000
"... A intrusion detection evaluation test bed was developed which generated normal traffic similar to that on a government site containing 100’s of users on 1000’s of hosts. More than 300 instances of 38 different automated attacks were launched against victim UNIX hosts in seven weeks of training data ..."
Abstract
-
Cited by 101 (2 self)
- Add to MetaCart
A intrusion detection evaluation test bed was developed which generated normal traffic similar to that on a government site containing 100’s of users on 1000’s of hosts. More than 300 instances of 38 different automated attacks were launched against victim UNIX hosts in seven weeks of training data and two weeks of test data. Six research groups participated in a blind evaluation and results were analyzed for probe, denialof-service (DoS), remote-to-local (R2L), and user to root (U2R) attacks. The best systems detected old attacks included in the training data, at moderate detection rates ranging from 63 % to 93 % at a false alarm rate of 10 false alarms per day. Detection rates were much worse for new and novel R2L and DoS attacks included only in the test data. The best systems failed to detect roughly half these new attacks which included damaging access to root-level privileges by remote users. These results suggest that further research should focus on developing techniques to find new attacks instead of extending existing rule-based approaches. 1.
BOAT -- Optimistic Decision Tree Construction
, 1999
"... Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision ..."
Abstract
-
Cited by 97 (1 self)
- Add to MetaCart
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all main-memory algorithms, make one scan over the training database per level of the tree. We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any differen...
Nonparametric estimation of average treatment effects under exogeneity: a review
- Review of Economics and Statistics
, 2004
"... Abstract—Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described a ..."
Abstract
-
Cited by 97 (6 self)
- Add to MetaCart
Abstract—Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described as exogeneity, unconfoundedness, or selection on observables. The implication of these assumptions is that systematic (for example, average or distributional) differences in outcomes between treated and control units with the same values for the covariates are attributable to the treatment. Recent analysis has considered estimation and inference for average treatment effects under weaker assumptions than typical of the earlier literature by avoiding distributional and functional-form assumptions. Various methods of semiparametric estimation have been proposed, including estimating the unknown regression functions, matching, methods using the propensity score such as weighting and blocking, and combinations of these approaches. In this paper I review the state of this
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract
-
Cited by 94 (6 self)
- Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious read-once decision graphs (OODGs).

