Results 1 - 10
of
40
Some statistical issues in the comparison of speech recognition algorithms
- In Proc. of ICASSP
, 1989
"... In the development of speech recognition algorithms, it is important to know whether any apparent difference in performance of algorithms is statistically significant, yet this issue is almost always overlooked. We present two simple tests for deciding whether the difference in error-rates between t ..."
Abstract
-
Cited by 156 (1 self)
- Add to MetaCart
In the development of speech recognition algorithms, it is important to know whether any apparent difference in performance of algorithms is statistically significant, yet this issue is almost always overlooked. We present two simple tests for deciding whether the difference in error-rates between two algorithms tested on the same data set is statistically significant. The first (McNemar’s test) requires the errors made by an algorithm to be independent events and is most appropriate for isolated word algorithms. The second (a matched-pairs test) can be used even when errors are not independent events and is more appropriate for connected speech. 1.
Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives
, 1996
"... The machine learning program Progol was applied to the problem of forming the structure-activity relationship (SAR) for a set of compounds tested for carcinogenicity in rodent bioassays by the U.S. National Toxicology Program (NTP). Progol is the first inductive logic programming (ILP) algorithm to ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
The machine learning program Progol was applied to the problem of forming the structure-activity relationship (SAR) for a set of compounds tested for carcinogenicity in rodent bioassays by the U.S. National Toxicology Program (NTP). Progol is the first inductive logic programming (ILP) algorithm to use a fully relational method for describing chemical structure in SARs, based on using atoms and their bond connectivities. Progol is well suited to forming SARs for carcinogenicity as it is designed to produce easily understandable rules (structural alerts) for sets of noncongeneric compounds. The Progol SAR method was tested by prediction of a set of compounds that have been widely predicted by other SAR methods (the compounds used in the NTP's first round of carcinogenesis predictions). For these compounds no method (human or machine) was significantly more accurate than Progol. Progol was the most accurate method that did not use data from biological tests on rodents (however, the difference in accuracy is not significant). tests for Salmonella mutagenicity. Using the full NTP database, the prediction accuracy of Progol was estimated to be 63 % (±3%) using 5-fold cross validation. A set of structural alerts for carcinogenesis was automatically generated and the chemical rationale for them investigatedthese structural alerts are statistically independent of the Salmonella mutagenicity. Carcinogenicity is predicted for the compounds used in the NTP's second round of carcinogenesis predictions. The results for prediction of carcinogenesis, taken together with the previous successful applications of predicting mutagenicity in nitroaromatic compounds, and inhibition of angiogenesis by suramin analogues, show that Progol has a role to play in understanding the SARs of cancer-related compounds. Environ Health Perspect 104(Suppl 5):1031-1040 (1996) The Progol predictions were based solely on chemical structure and the results of
Methodology for Reliable Schema Development and Evaluation of Manual Annotations
- Proceedings of the Workshop on Knowledge Markup and Semantic Annotation at the Second International Conference on Knowledge Capture (K-CAP 2003
, 2003
"... The quality of manual annotations of linguistic data depends on the use of reliable coding schemas as well as on the ability of human annotators to handle them appropriately. As is well known from a wide range of previous experiences annotations using highly complex coding schemas often lead to unac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The quality of manual annotations of linguistic data depends on the use of reliable coding schemas as well as on the ability of human annotators to handle them appropriately. As is well known from a wide range of previous experiences annotations using highly complex coding schemas often lead to unacceptable annotation quality. Reducing complexity might make schemas easier to handle, but in this way valuable information needed for more sophisticated applications is excluded as well. In order to deal with this problem, we developed a systematic approach to schema development, which allows for developing coding schemas for fine-grained semantic annotations while systematically securing the quality of such annotations. For illustration, we present examples from two projects where text and speech data are annotated.
Large deformation diffeomorphism and momentum based hippocampal shape discrimination in dementia of the Alzheimer type
- IEEE TRANS. MED. IMAG
, 2007
"... In large-deformation diffeomorphic metric mapping (LDDMM), the diffeomorphic matching of images are modeled as evolution in time, or a flow, of an associated smooth velocity vector field controlling the evolution. The initial momentum parameterizes the whole geodesic and encodes the shape and form o ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In large-deformation diffeomorphic metric mapping (LDDMM), the diffeomorphic matching of images are modeled as evolution in time, or a flow, of an associated smooth velocity vector field controlling the evolution. The initial momentum parameterizes the whole geodesic and encodes the shape and form of the target image. Thus, methods such as principal component analysis (PCA) of the initial momentum leads to analysis of anatomical shape and form in target images without being restricted to smalldeformation assumption in the analysis of linear displacements. We apply this approach to a study of dementia of the Alzheimer type (DAT). The left hippocampus in the DAT group shows significant shape abnormality while the right hippocampus shows similar pattern of abnormality. Further, PCA of the initial momentum leads to correct classification of 12 out of 18 DAT subjects and 22 out of 26 control subjects.
A rich feature vector for protein-protein interaction extraction from multiple corpora
- In: EMNLP
, 2009
"... Because of the importance of proteinprotein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it is necessary to learn from multiple different ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Because of the importance of proteinprotein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it is necessary to learn from multiple different corpora. In this paper, we propose a solution to this challenge. We designed a rich feature vector, and we applied a support vector machine modified for corpus weighting (SVM-CW) to complete the task of multiple corpora PPI extraction. The rich feature vector, made from multiple useful kernels, is used to express the important information for PPI extraction, and the system with our feature vector was shown to be both faster and more accurate than the original kernelbased system, even when using just a single corpus. SVM-CW learns from one corpus, while using other corpora for support. SVM-CW is simple, but it is more effective than other methods that have been successfully applied to other NLP tasks earlier. With the feature vector and SVM-CW, our system achieved the best performance among all state-of-the-art PPI extraction systems reported so far. 1
Human Observer Responses to Progressively Compressed Images
- In Proceedings of the 31st Asilomar Conference on Signals, Systems and Computers
, 1997
"... Mean squared error (MSE) and peak signal-to-noiseratio (PSNR) are the most common methods for measuring the quality of compressed images, despite the fact that their inadequacies have long been recognized. Quality for compressed still images is occasionally evaluated using human observers who provid ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Mean squared error (MSE) and peak signal-to-noiseratio (PSNR) are the most common methods for measuring the quality of compressed images, despite the fact that their inadequacies have long been recognized. Quality for compressed still images is occasionally evaluated using human observers who provide subjective ratings of the images. Both SNR and subjective quality judgments, however, may be inappropriate for evaluating progressive compression methods which are to be used for fast browsing applications. In this paper, we present a novel experimental and statistical framework for comparing progressive coders. The comparisons use response time studies in which human observers view a series of progressive transmissions, and respond to questions about the images as they become recognizable. We describe the framework and apply it to the comparison of several well known progressive algorithms. 1. Introduction A progressive image compression algorithm represents an image in such a way that t...
Predicting High-Risk Cholesterol Levels
, 1994
"... This paper has two purposes: to address these practical issues concerning cholesterol screening, and to report on and stimulate research in modern statistical techniques, especially new bootstrap methods, that proved to be useful for this set of problems. Our premise is that these methods are powerf ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper has two purposes: to address these practical issues concerning cholesterol screening, and to report on and stimulate research in modern statistical techniques, especially new bootstrap methods, that proved to be useful for this set of problems. Our premise is that these methods are powerful tools for analyzing issues of considerable policy importance. Related literature is either concerned with short-term fluctuations in measured cholesterol levels or with "tracking" -- measuring the correlation between cholesterol levels drawn years apart. A number of factors contribute to short-term fluctuations. Laboratories that fail to standardize and calibrate their measurements adequately often measure cholesterol inaccurately (Laboratory Standardization Panel, 1988). Even if the laboratory is perfectly accurate, physiological phenomena can cause an individual's cholesterol level to fluctuate. Stress, acute illness, seasonal effects, posture, and aspects of the blood-drawing technique can cause significant though transient changes in cholesterol levels (Thomas et al., 1961; Gordon et al., 1987; Hegsted and Nicolosi, 1987). Longer-term variation and issues of predictability are explored in a number of studies of tracking. Our effort is close in spirit to the latter literature. The tracking literature, however, primarily addresses whether a high cholesterol level at a young age is likely to predict the risk of cardiac disease in the distant future, while our work uses different methods and focuses on shorter-term prediction of cholesterol levels. In particular, our methods are designed to provide information about the performance of alternative policies toward cholesterol screening. The policy question is: how frequently does one need to screen in order to prevent the co...
Learning Bayesian Networks for Solving Real-World Problems
, 1998
"... Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dism ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dismissed as unfit for many real-world applications since probabilistic inference is intractable for most problems of realistic size, and algorithms for learning Bayesian networks impose the unrealistic requirement of datasets being complete. In this thesis, I present practical solutions to these two problems, and demonstrate their effectiveness on several real-world problems. The solution proposed to the first problem is to learn selective Bayesian networks, i.e., ones that use only a subset of the given attributes to model a domain. The aim is to learn networks that are smaller, and henc...
Text Type Structure and Logical Document Structure
- Proceedings of the ACL Workshop on Discourse Annotation
"... Most research on automated categorization of documents has concentrated on the assignment of one or many categories to a whole text. However, new applications, e.g. in the area of the Semantic Web, require a richer and more fine-grained annotation of documents, such as detailed thematic information ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Most research on automated categorization of documents has concentrated on the assignment of one or many categories to a whole text. However, new applications, e.g. in the area of the Semantic Web, require a richer and more fine-grained annotation of documents, such as detailed thematic information about the parts of a document. Hence we investigate the automatic categorization of text segments of scientific articles with XML markup into 16 topic types from a text type structure schema. A corpus of 47 linguistic articles was provided with XML markup on different annotation layers representing text type structure, logical document structure, and grammatical categories. Six different feature extraction strategies were applied to this corpus and combined in various parametrizations in different classifiers. The aim was to explore the contribution of each type of information, in particular the logical structure features, to the classification accuracy. The results suggest that some of the topic types of our hierarchy are successfully learnable, while the features from the logical structure layer had no particular impact on the results.
Improved confidence intervals for the difference between binomial proportions based on paired data
- Statistics in Medicine 17
, 1998
"... Existing methods for setting confidence intervals for the difference � between binomial proportions based on paired data perform inadequately. The asymptotic method can produce limits outside the range of validity. The ‘exact ’ conditional method can yield an interval which is effectively only one-s ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Existing methods for setting confidence intervals for the difference � between binomial proportions based on paired data perform inadequately. The asymptotic method can produce limits outside the range of validity. The ‘exact ’ conditional method can yield an interval which is effectively only one-sided. Both these methods also have poor coverage properties. Better methods are described, based on the profile likelihood obtained by conditionally maximizing the proportion of discordant pairs. A refinement (methods 5 and 6) which aligns 1! � with an aggregate of tail areas produces appropriate coverage properties. A computationally simpler method based on the score interval for the single proportion also performs well (method 10). � 1998 John Wiley & Sons, Ltd. 1.

