### Table 7-8. Selection of Appropriate Data Collection Design. Decision Type of Design Optimum Number of Samples

"... In PAGE 72: ...). Table7 -1. Data Collection Design Determination.... In PAGE 72: ...ype of non-statistical design is appropriate (e.g., haphazard or judgmental). If the design is non-statistical, fill in the following table and skip to worksheet activity 5; if the design is statistical, proceed to the next table in this worksheet activity. Table7 -2. Data Collection Design Alternatives.... In PAGE 73: ...Rev. 1 A-37 Table7 -3. Statistical Design Determination.... In PAGE 73: ... Define a suggested method for testing the statistical hypothesis defined above and define a sample size formula that corresponds to the method, if one exists. Table7 -4. Mathematical Formula Expressions Needed to Solve Design Problems.... In PAGE 73: ... For example, if a mean concentration of a COPC will be measured by a field screening instrument rather than through laboratory analyses, the model that relates the field screening results to the concentration results must be specified, along with any assumptions upon which the model is based. Table7 -5. Relationships and Assumptions Between True and Measured Values.... In PAGE 74: ... Vary the Type I and Type II error rates (and other inputs in the equations) to examine the relationship between the number of samples and the inputs. Table7 -6. Calculation of Number of Samples for Each Design Alternative.... In PAGE 74: ...e.g., spatial and temporal boundaries or scope of the project). Table7 -7. Results of Trade-Off Analysis.... In PAGE 75: ...conceptual site model, historical data, or unforeseen implementation problems is to plan an alternative course of action that may be appropriate. Table7 -9. Outline of Alternative Strategies.... In PAGE 76: ...Rev. 1 A-40 Table7 -10. Key Features of Selected Design.... In PAGE 76: ...istribution of the parameter of interest (e.g., the mean concentration is assumed Gaussian) y Statistical independence y Distribution of the population of interest y Model that shows the relationship between the variable being measured and the variable of interest. Table7 -11. Documentation on Theoretical Assumptions.... ..."

### Table 1. Performance comparison of clustering algorithms with and without iterative feature selection

"... In PAGE 8: ... Detailed analyses not given here also showed that the Markov Blanket filter imposes more influence on the stability and correctness of the clustering than the information gain filter, especially when the number of the features to be finally used is small. In a comparison of the clustering result using different approaches ( Table1 ), we can see that CLIFF outperforms both C3-means with feature selection and NCut without feature selection. The number of features selected and used to compute the affinity matrix C5 during each iteration is chosen em- pirically, and the clustering result is sensitive to different choices of this number.... ..."

### Table 10 Comparison between various wrapper approaches for feature subset selection Data set FSS BSS RC MFCMS

in C-means

2002

"... In PAGE 18: ... The three algorithms use the same lazy inductive learning algorithm. Table10 reports the results shown in [18] and, for convenience, those achieved by MFCMS on the same subset of data sets in Table 7. It can be observed that MFCMS shows the best compromise between recognition rate and number of selected features.... ..."

### Table 3 summarizes the results. Our primary nding is that, when feature se- lection is useful and when di erent features are selected for di erent bit func- tions, ECOCs can improve the accuracy of the local classi er IB1. Feature selection was not always appropriate; it increased accuracy on only 10

1997

"... In PAGE 11: ...Data Set Size C F Method Glass GL 214 6 9C 10-fold CV Clouds98 CL98 69 4 98C 10-fold CV Clouds99 CL99 321 7 99C 10-fold CV Clouds204 CL204 500 10 204C 10-fold CV Vowel VO 990 11 10C 528+462 Isolet IS 7797 26 617C 520+156 Letter LE 20000 26 16C 1600+400 Satellite SA 6435 6 36C 4435+2000 Segmentation SE 2310 7 19C 210+2100 Zoo ZO 101 7 16S 10-fold CV Table3 : Average Percent Accuracies (and Standard Deviations, for 10-fold CV runs) (M=mixed feature selection criterion; F=forward; B=backward)... In PAGE 14: ... The forward variant discards features in races where inclusion and exclusion have similar performance, whereas back- ward selects these features. Table3 lists the variant that performs best for each data set. As expected, forward works better than mixed and backward for the high-dimensional data sets and when a small number of features is relevant for each bit function.... ..."

Cited by 10

### Table 1: Previous approaches evaluated on narrative data from Biohazard

in Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in NLP, pages 32--39,

2007

"... In PAGE 6: ... 3-fold cross validation was done by training on two-thirds of the data and testing on the other third. Recalling from Table1 that both PLSA and TextTiling result in performance simi- lar to random even when given the correct number of segments, we note that all of the single train/test splits performed better than any of the naive algo- rithms and previous methods examined. To examine the ability of our algorithm to perform on unseen data, we trained on the entire Biohaz- ard book and tested on Demon in the Freezer.... In PAGE 7: ... For our selected features, boosted stump per- formance is similar to using an SVM, which rein- forces our intuition that the selected features (and not just classi cation method) are appropriate for this problem. Table1 indicates that the previous TextTiling and PLSA-based approaches perform close to random on narrative text. Our experiments show a perfor- mance improvement of gt;24% by our feature-based system, and signi cant improvement over other methods on the Groliers data.... ..."

Cited by 2

### Table 1: Previous approaches evaluated on narrative data from Biohazard

2007

"... In PAGE 6: ... 3-fold cross validation was done by train- ing on two-thirds of the data and testing on the other third. Recalling from Table1 that both PLSA and TextTiling result in performance similar to random even when given the correct number of segments, we note that all of the single train/test splits performed better than any of the naive algorithms and previous methods examined. To examine the ability of our algorithm to per- form on unseen data, we trained on the entire Biohaz- ard book and tested on Demon in the Freezer.... In PAGE 7: ... For our selected features, boosted stump performance is sim- ilar to using an SVM, which reinforces our intuition that the selected features (and not just classification method) are appropriate for this problem. Table1 indicates that the previous TextTiling and PLSA-based approaches perform close to random on narrative text. Our experiments show a performance improvement of a0 24% by our feature-based system, and significant improvement over other methods on the Groliers data.... ..."

Cited by 2

### Table 2: Characterization of recentwork on wrapper approaches to feature selection in terms of

1997

"... In PAGE 14: ...selection focus on nding attributes that are useful for performance (in the sense of De nition 5), rather than necessarily nding the relevant ones. Table2 characterizes the recent e orts on wrapper methods in terms of the dimensions discussed earlier, as well as the induction method used in each case to direct the search process. The table shows the diversityoftechniques that researchers havedeveloped, and the heavy reliance on the experimental comparison of variant methods.... ..."

Cited by 269

### Table 3. Promising approaches

"... In PAGE 4: ...ng data skewness. If one step makes little difference (e.g., feature selection for DT), we just set No as default to save computation time. Table3 lists the 12 promising ap- proaches to tackle data skewness. The approaches in Table 3 are derived from bias analysis.... In PAGE 4: ... Table 3 lists the 12 promising ap- proaches to tackle data skewness. The approaches in Table3 are derived from bias analysis. We now further evaluate them through comparative experi- ments to investigate whether they can improve performance of classifiers for text classification, and which one is more appropriate for highly skewed data.... ..."

### Table 1 Alternatives for Selected Features of Conjoint Analysis

2007

"... In PAGE 8: ...8 Current approaches for implementing a conjoint analysis project differ in terms of several features; some main features are: stimulus representation, formats of data collection, nature of data collection, and estimation methods. Table1 lays out some alternatives for these features. The approaches that are more commonly used are: ratings- based (or Full-profile) Conjoint Analysis; Choice-based Conjoint Analysis; Adaptive Conjoint Analysis; Self-explicated Conjoint Analysis.... In PAGE 9: ... I refer the reader to Green and Srinivasan (1978, 1990), Green and Carroll (1995), and Hauser and Rao (2003) for various details of these approaches. Insert Table1 about Here Typically, a linear, additive model is used to describe the evaluations (preferences) in a ratings-based conjoint study while a multinomial logit model is used to model the probability of choice of a profile for the choice-based conjoint studies. Undoubtedly, there are several variations of these basic models used in practice.... ..."

### Table 7-3 identifies the statistical design determination.

"... In PAGE 95: ...1 PURPOSE The purpose of DQO Step 7 is to identify the most resource-effective design while not exceeding the tolerable false-positive and false-negative decision error rates (which were specified in DQO Step 6 for generating data to support decisions), while maintaining the desired degree of precision and accuracy. Table7 -1 identifies the data collection design determination. Table 7-1.... In PAGE 95: ... Table 7-1 identifies the data collection design determination. Table7 -1. Data Collection Design Determination.... In PAGE 96: ...Rev. 0 7-2 Table7 -1. Data Collection Design Determination.... In PAGE 97: ...Rev. 0 7-3 Table7 -2. Data Collection Design Alternatives.... In PAGE 97: ...Table 7-3 identifies the statistical design determination. Table7 -3. Statistical Design Determination.... In PAGE 98: ...Rev. 0 7-4 Table7 -3. Statistical Design Determination.... In PAGE 98: ... Random sampling Ho for DS #7: The soils in the transition zone exceed site cleanup criteria identified in the interim remedial action ROD. Table7... In PAGE 99: ...I - 0129 3 Re v. 0 7- 5 Table7 -4. Sampling Strategies.... In PAGE 99: ... Sampling Strategies. (6 pages) DS # Decision Statement WS # Geographical Area of Interest Strata Rationale Data and Decision Type Sampling, Measurement Design (See Table7 -3b) Number of Measurements To Be Taken Layer of contaminated boulders and cobbles Boulders and cobbles have much lower surface area to volume ratio than underlying soils. If underlying soils meet ERDF waste acceptance criteria, boulders and cobbles will also meet the waste acceptance criteria.... In PAGE 100: ...I - 0129 3 Re v. 0 7- 6 Table7 -4. Sampling Strategies.... In PAGE 100: ... Sampling Strategies. (6 pages) DS # Decision Statement WS # Geographical Area of Interest Strata Rationale Data and Decision Type Sampling, Measurement Design (See Table7 -3b) Number of Measurements To Be Taken UPR-100-N-31 Contaminated native soil Excavated materials will be screened on bucket-by-bucket basis for health and safety. This screening, correlated with analytical laboratory results is sufficient to satisfy ERDF waste acceptance criteria.... In PAGE 101: ...I - 0129 3 Re v. 0 7- 7 Table7 -4. Sampling Strategies.... In PAGE 101: ... Sampling Strategies. (6 pages) DS # Decision Statement WS # Geographical Area of Interest Strata Rationale Data and Decision Type Sampling, Measurement Design (See Table7 -3b) Number of Measurements To Be Taken Contaminated native soil Excavated materials will be screened on bucket-by-bucket basis for health and safety. This screening, correlated with analytical laboratory results, is sufficient to satisfy ERDF waste acceptance criteria.... In PAGE 102: ...I - 0129 3 Re v. 0 7- 8 Table7 -4. Sampling Strategies.... In PAGE 102: ... Sampling Strategies. (6 pages) DS # Decision Statement WS # Geographical Area of Interest Strata Rationale Data and Decision Type Sampling, Measurement Design (See Table7 -3b) Number of Measurements To Be Taken Surface soil remaining after excavation Analytical laboratory results, RESRAD analysis of data to determine if remediated site presents a direct exposure threat. Random sampling and statistical decision Design E1: 116-N-1 surface soil closeout To be calculated, with a minimum of 10 samples 116-N-1 Crib and associated pipelines Subsurface soil remaining after excavation Analytical laboratory results, RESRAD analysis of data to determine if remediated site presents a direct exposure/groundwater protection threat.... In PAGE 103: ...I - 0129 3 Re v. 0 7- 9 Table7 -4. Sampling Strategies.... In PAGE 103: ... Sampling Strategies. (6 pages) DS # Decision Statement WS # Geographical Area of Interest Strata Rationale Data and Decision Type Sampling, Measurement Design (See Table7 -3b) Number of Measurements To Be Taken Surface soil remaining after excavation Analytical laboratory results, RESRAD analysis of data to determine if remediated site presents a direct exposure threat. Random sampling and statistical decision Design E3: 116-N-3 surface soils To be calculated, with a minimum of 10 samples 3 116-N-3 Crib and Trench, cover panels, and associated pipelines (upstream of the first dam) Subsurface soil remaining after excavation Analytical laboratory results, RESRAD analysis of data to determine if remediated site presents a direct exposure/groundwater protection threat.... In PAGE 104: ...I - 0129 3 Re v. 0 7- 10 Table7 -4. Sampling Strategies.... In PAGE 104: ... Sampling Strategies. (6 pages) DS # Decision Statement WS # Geographical Area of Interest Strata Rationale Data and Decision Type Sampling, Measurement Design (See Table7 -3b) Number of Measurements To Be Taken 5 Determine if contamination levels of borrow pit soil meet site criteria for use as backfill or if alternate backfill material must be used. 1, 2, 3, and 4 116-N-1, 116-N-3, UPR-100-N-31, 120-N-1, 120-N-2, 100-N-58 Crib, and associated pipelines Borrow pit soil Process knowledge and field screening.... In PAGE 105: ...Rev. 0 7-11 Table7 -4a. Sampling Designs.... In PAGE 106: ...Rev. 0 7-12 Table7 -4a. Sampling Designs.... In PAGE 107: ...Rev. 0 7-13 Table7 -4a. Sampling Designs.... In PAGE 108: ... 0 7-14 The mathematical formula expressions needed to solve the design problems are identified in Table 7-5. Table7 -5. Mathematical Formula Expressions Needed to Solve Design Problems.... In PAGE 109: ...Rev. 0 7-15 Table7 -5. Mathematical Formula Expressions Needed to Solve Design Problems.... In PAGE 109: ... The relationships and assumptions between true and measured values are identified in Table 7-6. Table7 -6. Relationships and Assumptions Between True and Measured Values.... In PAGE 110: ...Rev. 0 7-16 Table7 -7 includes the calculation of the number of samples for each design alternative. Using the equations outlined in DQO Step 3, the number of samples for each design alternative is calculated.... In PAGE 110: ... With these estimates of the variances, it is inappropriate to calculate the number of samples needed for closeout. Table7 -7. Calculation of Theoretical Number of Samples for Each Design Alternative.... In PAGE 110: ...he width of the gray region or Type I and Type II error rates), and other factors (e.g., spatial and temporal boundaries or scope of the project). Table7 -8 provides the results of the trade-off... In PAGE 111: ...Rev. 0 7-17 Table7 -8. Results of Trade-Off Analysis.... In PAGE 111: ... The recommended approach to verification sampling is to collect preliminary screening samples and analyze them using gamma energy analysis. Then, using the equation shown in Table7 -7, calculate the number of verification samples that should be collected. This strategy has worked in past remediation in the 100 Areas.... In PAGE 111: ... The results of the trade-off analyses should lead to one of two outcomes: (1) the selection of a design that most efficiently meets all of the DQO constraints, or (2) the modification of one or more outputs from DQO Steps 1 through 6 and the selection of a design that meets the new constraints. Table7 -9 identifies the selection of the appropriate data collection design. Table 7-9.... In PAGE 111: ... Table 7-9 identifies the selection of the appropriate data collection design. Table7 -9. Selection of Appropriate Data Collection Design.... In PAGE 111: ... 7 Systematic 12 samples. An outline of alternative strategies is presented in Table7 -10. Table 7-10.... In PAGE 111: ... An outline of alternative strategies is presented in Table 7-10. Table7 -10. Outline of Alternative Strategies.... In PAGE 111: ...able 7-10. Outline of Alternative Strategies. Decision Alternative 3 and 4 If the analytical results are not sufficient to demonstrate that cleanup levels are met based on sample design, a combination of statistical analysis, professional judgment, and balancing factors (agreed to by the regulators) will be used to determine if the site should be further excavated. Table7... In PAGE 112: ...Rev. 0 7-18 Table7 -11. Key Features of Selected Design.... In PAGE 112: ...able 7-11. Key Features of Selected Design. Decisions 2, 3, and 4 Strata of interest should be randomly sampled. Table7 -12 documents the theoretical assumption. Table 7-12.... In PAGE 112: ... Table 7-12 documents the theoretical assumption. Table7 -12. Documentation on Theoretical Assumptions.... ..."