### Table 1: Descriptive statistics and test results for eight types of insurance policies. Estimation sample. (N=800)

"... In PAGE 10: ... Table1 presents sample characteristics, hit rates, and test statistics for eight di erent types of insurance policies that are sold by the insurance com- pany for the estimation sample. The rst row of this table reports the fraction of the households that owns the particular type of insurance, i.... In PAGE 10: ... The fth row presents the standardized test statistic for predictive performance, presented in (2), while the last row presents the test statistic for predictor de- pendence, which asymptotically has a N(0; 1) distribution, see equation (6) in Pesaran and Timmermann (1992). From the last two rows of Table1 it is clear that the two tests arrive at dif- ferent conclusions for which insurance types the Probit model results in good predictions for the estimation sample. The test for predictive performance indicates that the Probit model has predictive performance for insurance type 5, and, depending on the desired level of signi cance, also for insurance type 6.... In PAGE 10: ... The question that remains is how well the model actually performs in out-of-sample prediction, which is what prediction models are generally used for. Table 2 presents the sample statistics and test results for the validation sample, like Table1 does for the estimation sample. The results of the test for predictive performance and the test for predictor dependence for the validation sample are presented in the last two rows of the table.... ..."

### Table 3. Paired model statistical analysis results on the di erence between a pair of performance indexes (in percent) and the corresponding 95% con dence intervals. A (*) indicates that the di erence is statistically signi cant at = 0:05; and no (*) indicates that the di erence is not signi cant. We see that there is no signi cant di erence between Voronoi, Docstrum and Caere algorithms. However this group is signi cantly better than Scansoft, which is inturn is better than XY-cut.

2000

"... In PAGE 8: ....3. Statistical Analysis of Results We employed a paired model11 to compare the performance index and testing time di erence between each possible algorithm pair, then compute their con dence intervals. The analysis results for performance index and processing timing are reported in a matrix in Table3 and Table 4 respectively. If we denote Tij as the value of table cell at ith row and jth column, Tij = ai ? aj where ai is the performance index (algorithm timing) value of algorithm on the ith row, aj is the performance index (algorithm timing) value of algorithm on the jth column.... In PAGE 8: ... Note that the normalized processing timing is used for two commercial products. From Table3 , we can nd that the performance indexes of Kise apos;s algorithm, Caere apos;s segmentation algorithm and Docstrum are not statistically di erent, but they are statistically better than those of ScanSoft apos;s segmentation algorithm and X-Y cut algorithms, the performance index of ScanSoft apos;s segmentation algorithm is statistically better than that of X-Y cut. From Table 4, we can nd that all algorithms have statistically signi cant processing time from each other.... ..."

Cited by 11

### Table 3: Parameters that signi cantly a ect perplexity for each smoothing algorithm, and insignif- icant parameters and their default values

1998

"... In PAGE 25: ... For each parameter, we tried three di erent training sets: 20,000 words from the WSJ corpus, one million words from the Brown corpus, and three million words from the WSJ corpus. We summarize the results of these experiments in Table3 ; Chen (1996) gives more details. For each algorithm, we list the parameters we found to be signi cant (and thus search over in each later experiment); we also list the insigni cant parameters and the value we set them to.... ..."

Cited by 370

### Table 3: Parameters that signi cantly a ect perplexity for each smoothing algorithm, and insignif-

in Abstract

1998

"... In PAGE 25: ... For each parameter, we tried three di erent training sets: 20,000 words from the WSJ corpus, one million words from the Brown corpus, and three million words from the WSJ corpus. We summarize the results of these experiments in Table3 ;; Chen (1996) gives more details. For each algorithm, we list the parameters we found to be signi cant (and thus searchover in each later experiment);; we also list the insigni cant parameters and the value we set them to.... ..."

### Table XX. Overview of the performance of the web-based models investigated in this paper (Ling: linguistic knowledge; Type: type of task; Base: comparison against baseline; BNC: compari- son against BNC-based model; Lit: comparison against best model in the literature). The symbols indicate signi cance (4: signi cantly better; : not signi cantly di erent; 5: signi cantly worse). The indicates that the web-based model used interpolation or backo ).

2005

Cited by 12

### Table 2: Analysis of variance showing a signi cant main e ect of grade level and a signi cant interaction e ect.

1996

"... In PAGE 3: ...e done about it. But usually, sample variance re ects the combined in uence of several factors. If you can tease these in uences apart, you can get statistically signi cant results with no additional data, and a better understanding of the data, as well. To illustrate, Table2 shows a two-way analysis of variance of the gender dataset. Two-way analysis of variance decomposes the sample variance into four parts: two represent the e ects of the factors (called main e ects), one represents the interaction between the factors (called the interaction e ect), and one is due to random chance (called error).... In PAGE 3: ... Two-way analysis of variance decomposes the sample variance into four parts: two represent the e ects of the factors (called main e ects), one represents the interaction between the factors (called the interaction e ect), and one is due to random chance (called error). The mean square column in Table2 gives the relative magnitudes of these components of variance. (Mean squares are just summed, squared deviations divided by degrees of freedom, both listed in Table 2, i.... In PAGE 3: ... The mean square column in Table 2 gives the relative magnitudes of these components of variance. (Mean squares are just summed, squared deviations divided by degrees of freedom, both listed in Table2 , i.e.... In PAGE 3: ...vercon dent), whereas girls apos; con dence starts lower (3.75), peaks in fth grade, and then drops. In short, boys apos; con dence follows a di erent developmental pattern than girls apos;. The interaction e ect in Table2 picks up this di erence. So, does gender have an e ect on con dence? The t test says no, but the analysis of variance,... ..."

Cited by 2

### Table 5-8: A comparison of the performance of word-dependent models in the Viterbi search and in the re-sorting pass. Performance is slightly better in the Viterbi search, though the di erences are not very statistically signi cant (each result is signi cant to within 0.25%).

1997

"... In PAGE 9: ... . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5-4 Summary of the performance of several di erent types of context- dependent models. For comparison purposes, the rst two columns are the results from Table5 -3, and the last two columns are the results of the same models, after being interpolated with context-independent models. The numbers in parentheses are the percent reduction in word error rate as a result of interpolation.... In PAGE 39: ... The statistics are similar for the biphones and triphones, and are summarized in Table 5-1. Number of Contexts in Train109 Model Type Total 25 or More Times 50 or More Times Context-Independent 60 60 60 Word-Dependent 8978 1665 699 Biphone 1589 860 633 Triphone 9239 1662 882 Table5 -1: The number of contexts represented in the training data for each type of context-dependent model, with cut-o s of either 25 or 50 training tokens. 5.... In PAGE 40: ...Table5 -2 summarizes the coverage of each type of context-dependent model on the Feb. 1989 10-speaker test set.... In PAGE 40: ...8 % 92.9 % Table5 -2: Test-Set Coverage of context-dependent models. The most signi cant result is that the biphone models enjoy a very high coverage of the test set, despite using fewer models than either the triphones or the word- dependent models.... In PAGE 40: ...4.4 Results Table5 -3 summarizes the performance achieved by each type of context-dependent model on the utterances in the 10-speaker 1989 test set. Each experiment was re- peated 15 times, and the results given in the table are the averages of the results of each trial, accurate to within 0.... In PAGE 41: ... The tech- nique of deleted interpolation, introduced in the next chapter, aleviates this problem by smoothing context-dependent models with their more general context-independent counterparts. See Table5 -4 for a summary of the performance of these models when they have rst been interpolated with context-independent models. 5.... In PAGE 41: ...1.74 % (11.3 %) Table 5-4: Summary of the performance of several di erent types of context- dependent models. For comparison purposes, the rst two columns are the results from Table5 -3, and the last two columns are the results of the same models, after being interpolated with context-independent models. The numbers in parentheses are the percent reduction in word error rate as a result of interpolation.... In PAGE 42: ... As before, examining the weights given by the deleted interpolation can provide us with some insights into the relative value of various types of context-dependent models. Table5 -5 summarizes the results of experiments where the triphone models were interpolated with context-independent and biphone models, in various combinations. TRI RB LB CI Word Error Rate X 13.... In PAGE 42: ...76 % X X X 12.07 % Table5 -5: Results of experiments in which triphone models were interpolated with left and right biphone models and context-independent models, in various combinations. In no case did the word error rate improve over the simple interpolation with the context-independent models only.... In PAGE 43: ... Their drawback is a lack of coverage. Table5 -6 is an attempt to quantify the notion that a model apos;s true worth must somehow take into account its test-set coverage. The table gives, for each type of model, the percent decrease in word error rate from the baseline system, as well as this percentage divided by the test-set coverage for that type of model.... In PAGE 44: ...86 % 20.7 % Table5 -7: The results of various combinations of backo strategies. The performance is essentially the same for all combinations, and does not represent an improvement over the increased coverage that can be obtained by decreasing the required number of tokens per model.... In PAGE 45: ...25%). Table5 -8 summarizes the results of a comparison between the performance of the word-dependent models in the Viterbi search with their performance in the resorting pass. In these experiments, although there is a slight edge in performance in the Viterbi search, the di erences are not very signi cant compared to the uncertainty in the measurement of the word error rate.... ..."

Cited by 6

### Table II. Fictitious example for a contingency table for a 2 test comparing to models

2005

Cited by 12

### Table 4: Number of queries for which statistical and syntactic phrases yield signi cantly di erent perfor- mance.

1997

"... In PAGE 6: ...Table 4: Number of queries for which statistical and syntactic phrases yield signi cantly di erent perfor- mance. Table4 shows the number of queries for which the performance of syntactic and statistical phrases is noticeably di erent. In terms of average precision, the performance of statistical and syntactic phrases di ers signi cantly for only about half the queries | for 12 queries, syntactic phrases (using lTu weights) outperform statistical phrases by at least 5%, for 12 queries, they are worse by at least 5%.... ..."

Cited by 53