Results 1 - 10
of
32
Clause restructuring for statistical machine translation
- In ACL
, 2005
"... We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2 % Bleu score for a baseline system to 26.8 % Bleu score for the system with reordering, a statistically significant improvement.
Analysis of Gene Expression Microarrays for Phenotype Classification
- Proc. Int. Conf. Intell. Syst. Mol. Biol
, 2000
"... Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNA-microarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify "patterns" of gene expressio ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNA-microarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify "patterns" of gene expression that can be used to predict cell phenotype. The potential number of such patterns is exponential in the number of genes. In this paper, we propose a solution to this problem based on a supervised learning algorithm, which differs substantially from previous schemes. It couples a complex, non-linear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm called SPLASH. The latter discovers efficiently and deterministically all statistically significant gene expression patterns in the phenotype set. Statistical significance is evaluated based on the probability of a pattern to occur by chance in ...
Upper level set scan statistic for detecting arbitrarily shaped hotspots
- Environmental and Ecological Statistics
, 2004
"... A declared need is around for geoinformatic surveillance statistical science and software infrastructure for spatial and spatiotemporal hotspot detection. Hotspot means something unusual, anomaly, aberration, outbreak, elevated cluster, critical resource area, etc. The declared need may be for monit ..."
Abstract
-
Cited by 27 (16 self)
- Add to MetaCart
A declared need is around for geoinformatic surveillance statistical science and software infrastructure for spatial and spatiotemporal hotspot detection. Hotspot means something unusual, anomaly, aberration, outbreak, elevated cluster, critical resource area, etc. The declared need may be for monitoring, etiology, management, or early warning. The responsible factors may be natural, accidental, or intentional. This proof-of-concept paper suggests methods and tools for hotspot detection across geographic regions and across networks. The investigation proposes development of statistical methods and tools that have immediate potential for use in critical societal areas, such as public health and disease surveillance, ecosystem health, water resources and water services, transportation networks, persistent poverty typologies and trajectories, environmental justice, biosurveillance and biosecurity, among others. We introduce, for multidisciplinary use, an innovation of the health-area-popular circle-based spatial and spatiotemporal scan statistic. Our innovation employs the notion of an upper level set, and is accordingly called the upper level set scan statistic, pointing to a sophisticated analytical and computational system as the next generation of the present day popular SaTScan. Success of surveillance rests on potential elevated cluster detection capability. But the clusters can be of any shape, and cannot be captured only by circles. This is likely to give more of false alarms and more of false sense of security. What we need is capability to detect arbitrarily shaped clusters. The proposed upper level set scan statistic innovation is expected to ®ll this need
Multiple Indicators, partially ordered sets, and linear extensions: Multi-criterion ranking and prioritization
, 2004
"... ..."
Monte Carlo conditioning on a sufficient statistic
, 2000
"... We derive general formulae for computation of conditional expectations of functions (X) given T = t, when (X; T) is a pair of random vectors such that T is sufficient compared to X for a parameter . The basic assumption is that there is a random vector U with a known distribution and functions ; s ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We derive general formulae for computation of conditional expectations of functions (X) given T = t, when (X; T) is a pair of random vectors such that T is sufficient compared to X for a parameter . The basic assumption is that there is a random vector U with a known distribution and functions ; such that (X; T) under the parameter value has the same distribution as the pair ((U; ); (U; )). The conditional expectations are then expressed in terms of ordinary expectations of functions of U, and are thus well suited for computation by Monte Carlo simulation. The clue is to equip the parameter space by a suitably chosen σ-finite measure. The problem of direct sampling from conditional distributions of X given T = t is also considered. It is shown in particular that when the model satises a certain pivotal condition, then this can be done by sampling (U; ) where the value of is adjusted for each U in such a manner that (U; ) = t is kept fixed. Several examples are given in order to demonstrate different cases which may occur in practice.
Horizon problems and extreme events in financial risk management. Economic Policy Review, Federal Reserve Bank of
, 1998
"... There is no one “magic ” relevant horizon for risk management. Instead, the relevant horizon will generally vary by asset class (for example, equity versus bonds), industry (banking versus insurance), position in the firm (trading ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
There is no one “magic ” relevant horizon for risk management. Instead, the relevant horizon will generally vary by asset class (for example, equity versus bonds), industry (banking versus insurance), position in the firm (trading
Using urn models for the design of clinical trials
- Indian Journal of Statistics
"... SUMMARY. When conducting a clinical trial to compare K ≥ 2 treatments, it is essential to randomize patients to treatment in order to reduce experimental bias. Typically, an equal number of patients are assigned to each treatment group in order to increase the precision of inference con-cerning trea ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
SUMMARY. When conducting a clinical trial to compare K ≥ 2 treatments, it is essential to randomize patients to treatment in order to reduce experimental bias. Typically, an equal number of patients are assigned to each treatment group in order to increase the precision of inference con-cerning treatment effect. However, if the treatment difference is large and the endpoint potentially fatal, it seems unethical in an unmasked trial to sacrifice a large number of study patients for the purpose of maintaining balance in the number assigned to each treatment group. In this article, we survey those treatment randomization rules that are designed for use in each of the above two experimental situations, and have the empirical advantage of being well explained in terms of an urn model. 1.
Finite Sample Size Results for Robust Model Selection; Application to Neural Networks
- Technical Report, Dept. E.E., Technion, Publication # CC-117
, 1995
"... The problem of model selection in the face of finite sample size is considered within the framework of statistical decision theory. Focusing on the special case of regression, we introduce a model selection criterion which is shown to be robust in the sense that, with high confidence, even for a fin ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The problem of model selection in the face of finite sample size is considered within the framework of statistical decision theory. Focusing on the special case of regression, we introduce a model selection criterion which is shown to be robust in the sense that, with high confidence, even for a finite sample size it selects the best model. Our derivation is based on uniform convergence methods, augmented by results from the theory of function approximation, which permit us to make definite probabilistic statements about the finite sample behavior. These results stand in contrast to classical approaches, which can only guarantee the asymptotic optimality of the choice. The criterion is demonstrated for the problem of model selection in feedforward neural networks. Index Terms---Statistical model selection, uniform strong law of large numbers, structural risk minimization, lower and upper bounds on approximation error, algorithmic complexity, neural networks. 1 Introduction A major pr...
Vanishing shortcoming and asymptotic relative efficiency
- Ann. Statist
, 2000
"... The shortcoming of a test is the difference between the maximal attainable power and the power of the test under consideration. Vanishing shortcoming, when the number of observations tends to infinity, is therefore an optimality property of a test. Other familiar optimality criteria are based on the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The shortcoming of a test is the difference between the maximal attainable power and the power of the test under consideration. Vanishing shortcoming, when the number of observations tends to infinity, is therefore an optimality property of a test. Other familiar optimality criteria are based on the asymptotic relative efficiency of the test. The relations between these optimality criteria are investigated. It turns out that vanishing shortcoming is seemingly slightly stronger than first order efficiency, but in regular cases there is equivalence. The results are in particular applied on tests for goodness-of-fit.
Preidentification Of Errors Distribution In Linear Dynamic Systems
"... Preidentification is understood here as a number of steps which should be (or can be) performed before actuall identification is performed. In the paper we illustrate this general idea by considering a new approach to estimating probability distribution of random errors acting on a linear dynamic sy ..."
Abstract
- Add to MetaCart
Preidentification is understood here as a number of steps which should be (or can be) performed before actuall identification is performed. In the paper we illustrate this general idea by considering a new approach to estimating probability distribution of random errors acting on a linear dynamic system. The main emphasis is on indicating that it is possible to perform estimation before actual identification, i.e., when neither number of lags nor parameter estimates are known. Inference is based on estimating the characteristic function of the underlying distribution. The approach is applicable when the input signal is either the unit step function or it is a slowly varying signal. As a byproduct a modification of the standard estimator of a characteristic function is proposed and investigated. The need for such a modification became clear to the author in the course of simulations studies. It occured that the classical estimator, although consistent, has too wigly behavior at the tail...

