### Table 1: Categorization of feature selection algorithms in a three-dimensional framework

2005

"... In PAGE 9: ...There exists a vast body of available feature selection algorithms. In order to better understand the inner instrument of each algorithm and the commonalities and differences among them, we develop a three-dimensional categorizing framework (shown in Table1 ) based on the previous dis- cussions. We understand that search strategies and evaluation criteria are two dominating factors in designing a feature selection algorithm, so they are chosen as two dimensions in the framework.... In PAGE 9: ... We understand that search strategies and evaluation criteria are two dominating factors in designing a feature selection algorithm, so they are chosen as two dimensions in the framework. In Table1 , under Search Strategies, algorithms are categorized into Complete, Sequential,andRan- dom. Under Evaluation Criteria, algorithms are categorized into Filter, Wrapper,andHybrid.... In PAGE 10: ... Within the Wrapper category, Predictive Accuracy is used for Classification,andCluster Goodness for Clustering. Many feature selection algorithms collected in Table1 can be grouped into distinct categories according to these characteristics. The categorizing framework serves three roles.... In PAGE 10: ...nd Random. Both groups have more than one algorithm available 1. Third, the framework also reveals what are missing in the current collection of feature selection algorithms. As we can see, there are many empty blocks in Table1 , indicating that no feature selection algorithm exists for these combinations which might be suitable for potential future work. In particular, for example, current feature selection algorithms for clustering are only limited to sequential search.... ..."

Cited by 27

### Table 1 Properties of techniques for dimensionality reduction.

"... In PAGE 11: ...2. General properties In Table1 , the thirteen dimensionality reduction tech- niques are listed by four general properties: (1) the con- vexity of the optimization problem, (2) the main free... In PAGE 11: ... We discuss the four general properties below. For property 1, Table1 shows that most techniques for dimensionality reduction optimize a convex cost func- tion. This is advantageous, because it allows for find- ing the global optimum of the cost function.... In PAGE 11: ... Because of their nonconvex cost functions, autoencoders, LLC, and manifold charting may suffer from getting stuck in local optima. For property 2, Table1 shows that most nonlinear tech- niques for dimensionality reduction all have free param- eters that need to be optimized. By free parameters, we mean parameters that directly influence the cost func- tion that is optimized.... In PAGE 11: ... The main advantage of the presence of free parameters is that they provide more flexibility to the technique, whereas their main disadvantage is that they need to be tuned to optimize the performance of the di- mensionality reduction technique. For properties 3 and 4, Table1 provides insight into the computational and memory complexities of the com- putationally most expensive algorithmic components of the techniques. The computational complexity of a di- mensionality reduction technique is of importance to its applicability.... In PAGE 12: ...duction technique is determined by data properties such as the number of datapoints n, the original dimension- ality D, the target dimensionality d, and by parameters of the techniques, such as the number of nearest neigh- bors k (for techniques based on neighborhood graphs) and the number of iterations i (for iterative techniques). In Table1 , p denotes the ratio of nonzero elements in a sparse matrix to the total number of elements, m indi- cates the number of local models in a mixture of factor analyzers, and w is the number of weights in a neural network. Below, we discuss the computational complex- ity and the memory complexity of each of the entries in the table.... ..."

### Table 1: Advantages of aggressive dimensionality reduction

2001

"... In PAGE 10: ... The rationale behind these methods is that any change in the nearest neighbor from the full dimensionality leads to loss of information; the rationale behind our approachistobe aggressive in removing the dimensions whichhavelow co- herence as noise; thus, on an overall basis the aggressiveness of a dimensionality reduction process which uses the coher- ence probability of the dimensions may lead to very low precision with respect to the original data but much higher e#0Bectiveness and coherence. In order to illustrate our point, wehave indicated #28in Table1 #29 the prediction accuracy us- ing a 1#25-thresholding technique in which only those eigen- values which are less than 1#25 of the largest eigenvalue are discarded. This prediction accuracy is typically very close to the full dimensional accuracy and is signi#0Ccantly lower than the optimal accuracy for all 3 data sets #28as illustrated in the accuracy charts of Figures 5, 8, 11#29.... In PAGE 10: ... Thus, such a drastic reduction in dimensionality does not attempt to mirror the original nearest neighbors in the data; but rather improves their qualityby removing the noise e#0Bects in high dimensionality. It is also clear from Table1 that the opti- mal accuracy dimensionality is signi#0Ccantly lower than the 1#25-thresholding method. In fact, the dimensionality for the 1#25-thresholding method is quite close to the full dimension- ality.... ..."

Cited by 13

### Table 1: Advantages of aggressive dimensionality reduction Data Set Full Dimensional Optimal Quality Optimal Quality 1%-thresholding 1%-thresholding

2001

"... In PAGE 10: ... The rationale behind these methods is that any change in the nearest neighbor from the full dimensionality leads to loss of information; the rationale behind our approach is to be aggressive in removing the dimensions which have low co- herence as noise; thus, on an overall basis the aggressiveness of a dimensionality reduction process which uses the coher- ence probability of the dimensions may lead to very low precision with respect to the original data but much higher e ectiveness and coherence. In order to illustrate our point, we have indicated (in Table1 ) the prediction accuracy us- ing a 1%-thresholding technique in which only those eigen- values which are less than 1% of the largest eigenvalue are discarded. This prediction accuracy is typically very close to the full dimensional accuracy and is signi cantly lower than the optimal accuracy for all 3 data sets (as illustrated in the accuracy charts of Figures 5, 8, 11).... In PAGE 10: ... Thus, such a drastic reduction in dimensionality does not attempt to mirror the original nearest neighbors in the data; but rather improves their quality by removing the noise e ects in high dimensionality. It is also clear from Table1 that the opti- mal accuracy dimensionality is signi cantly lower than the 1%-thresholding method. In fact, the dimensionality for the 1%-thresholding method is quite close to the full dimension- ality.... ..."

Cited by 13

### Table 6 shows timings for the solution of the three-dimensional problem

"... In PAGE 22: ...6 14.3 Table6 : Solution times (in seconds) and speedups for the restricted weakly overlap- ping algorithm on the three-dimensional problem (58) (taken from [40]). equations based upon a geometric decomposition of the problem.... ..."

### Table 6 shows timings for the solution of the three-dimensional problem

"... In PAGE 22: ...6 14.3 Table6 : Solution times (in seconds) and speedups for the restricted weakly overlap- ping algorithm on the three-dimensional problem (58) (taken from [40]). equations based upon a geometric decomposition of the problem.... ..."

### Table 2. Histogram of RTT reduction percentages

2006

"... In PAGE 5: ... They found that 51% of the time, improved latency could be obtained via the overlay. This can be compared to the intra-North-America (NA-NA) row in Table2 in the present study, where an overlay gives improved latency 63% of the time. They also found a single-hop indirection to be su cient.... In PAGE 5: ...7 cities in the U.S., and can be compared with 110 node, intra-North-America (NA-NA) case herein, where we found that the overall latency improvement to be approximately 21%, although the improvement varies signi cantly across di erent continent pairs. See Table2 below. Savage et al.... In PAGE 5: ... We obtained the comparable value of 9% for the case of intra-North America nodes, though signi cantly disparate results for other continent pairs. Again, please refer to Section 4 and Table2... In PAGE 8: ... We divide the data set into buckets based on the continent pair, and the percentage of improvement in the latency of the fastest indirect path (which might be slower than the direct path) as compared to the direct path. Table2 shows the percentage of samples that fell in each of the buckets. The rows of the table sum to 100%.... In PAGE 8: ... The rows of the table sum to 100%. As an explanatory example for Table2 , consider the AS-AS row. The bucket lt; 10% shows the cases where the best indirect paths are at least 10% slower than the direct path.... In PAGE 8: ...2% of the paths saw large latency reductions of a factor of two or better from the alternate paths found by the overlay. Prior work in routing overlays has concentrated on intra North America paths, corresponding to the NA-NA row in Table2 . These prior results under- estimate the gains in some parts of the world, and overestimate them in some others.... In PAGE 11: ... (Note that the data in Table 3 for the 90th percentile section does not include paths in higher buckets, but the following analysis does). We then recomputed the histogram of RTT reduction (such as the one in Table2 ) only for those paths. The distribution of the samples in this table is dramatically di erent compared to Table 2.... In PAGE 11: ... We then recomputed the histogram of RTT reduction (such as the one in Table 2) only for those paths. The distribution of the samples in this table is dramatically di erent compared to Table2 . The last three columns (at least 10% improvement) carry the most weight of each category.... In PAGE 11: ... The last three columns (at least 10% improvement) carry the most weight of each category. For easier comparison with Table2 , this data is pre- sented graphically in Figure 1. The rst bar-chart in Figure 1 shows the last three columns of Table 2 (all paths).... In PAGE 11: ... For easier comparison with Table 2, this data is pre- sented graphically in Figure 1. The rst bar-chart in Figure 1 shows the last three columns of Table2 (all paths). The second bar-chart shows the same for the poor paths.... In PAGE 15: ... We implement the same pruning algorithm in NA with a slight modi cation: if both nodes are on the east coast or both on the west coast of North America, we pick candidates from nodes in these respective locations. We then re-compute the results in Table2 and Figure 2 with random-path subsets of sizes 1, 2, 3 and 5. An interesting result is that a random policy with a set size of 3 provides availability gains comparable to the ideal case for paths in the categories NA-NA and EU-EU.... ..."

Cited by 3

### Table 1: Denoising via wavelet soft thresholding 6.2 Thresholding for compression of one-dimensional signals We also performed a model compression experiment, using the same one-dimensional signal as in the denoising experiments. We applied seven iterations of the cascade algorithm on this 512-point signal to get the wavelet coe cients k, using the same three types of wavelet and multiwavelet lter banks. For a fair comparison, we retained the same number of the largest coe cients for each transform, then inverted the cascade algorithm to reconstruct the signal. The results are shown in Table 2 and Figure 11.

"... In PAGE 18: ... Boundaries are handled by symmetric data extension for the critically sampled (approximation/deapproximation) and oversampled schemes, and by circular periodization for D4. Results of a typical experiment are shown in Table1 and Figure 10. In all experiments both types of GHM lter banks performed better than D4.... ..."

### Table 1: Denoising via wavelet soft thresholding 6.2 Thresholding for compression of one-dimensional signals We also performed a model compression experiment, using the same one-dimensional signal as in the denoising experiments. We applied seven iterations of the cascade algorithm on this 512-point signal to get the wavelet coe cients k, using the same three types of wavelet and multiwavelet lter banks. For a fair comparison, we retained the same number of the largest coe cients for each transform, then inverted the cascade algorithm to reconstruct the signal. The results are shown in Table 2 and Figure 11. GHM with \appr. quot; GHM with \rep. row quot;

### Table 1: Reconstruction errors for generated data at true k and m (n = 1000). s is the standard deviation of the data. Seg is the error of the k-segmentation and (k; h) is the error of the (k; h) segmentation; no dimensionality reduction takes place in these two methods.

2006

"... In PAGE 6: ... We chose h so that the number of parameters in our model and the data model of (k; h)-segmentation are as close as possible: h = dm(d + k)=de. Table1 shows the reconstruction errors of the three algorithms on the generated data, rounded to the nearest integer. We also show the error of plain k-segmentation and (k; h)-segmentation, in which the reconstruction error is measured as the di erence be- tween a data point and the segment mean.... ..."

Cited by 4