Results 1  10
of
39
Clause restructuring for statistical machine translation
 In ACL
, 2005
"... We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the ..."
Abstract

Cited by 104 (2 self)
 Add to MetaCart
We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language wordorder than the original string. The reordering approach is applied as a preprocessing step in both the training and decoding phases of a phrasebased statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2 % Bleu score for a baseline system to 26.8 % Bleu score for the system with reordering, a statistically significant improvement.
Analysis of Gene Expression Microarrays for Phenotype Classification
 Proc. Int. Conf. Intell. Syst. Mol. Biol
, 2000
"... Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNAmicroarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify "patterns" of gene expressio ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
Several microarray technologies that monitor the level of expression of a large number of genes have recently emerged. Given DNAmicroarray data for a set of cells characterized by a given phenotype and for a set of control cells, an important problem is to identify "patterns" of gene expression that can be used to predict cell phenotype. The potential number of such patterns is exponential in the number of genes. In this paper, we propose a solution to this problem based on a supervised learning algorithm, which differs substantially from previous schemes. It couples a complex, nonlinear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm called SPLASH. The latter discovers efficiently and deterministically all statistically significant gene expression patterns in the phenotype set. Statistical significance is evaluated based on the probability of a pattern to occur by chance in ...
Optimal Inference in Regression Models with Nearly Integrated Regressors
, 2004
"... This paper considers the problem of conducting inference on the regression coefficient in a bivariate regression model with a highly persistent regressor. Gaussian power envelopes are obtained for a class of testing procedures satisfying a conditionality restriction. In addition, the paper proposes ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
This paper considers the problem of conducting inference on the regression coefficient in a bivariate regression model with a highly persistent regressor. Gaussian power envelopes are obtained for a class of testing procedures satisfying a conditionality restriction. In addition, the paper proposes feasible testing procedures that attain these Gaussian power envelopes whether or not the innovations of the regression model are normally distributed.
Upper level set scan statistic for detecting arbitrarily shaped hotspots
 Environmental and Ecological Statistics
, 2004
"... A declared need is around for geoinformatic surveillance statistical science and software infrastructure for spatial and spatiotemporal hotspot detection. Hotspot means something unusual, anomaly, aberration, outbreak, elevated cluster, critical resource area, etc. The declared need may be for monit ..."
Abstract

Cited by 30 (16 self)
 Add to MetaCart
A declared need is around for geoinformatic surveillance statistical science and software infrastructure for spatial and spatiotemporal hotspot detection. Hotspot means something unusual, anomaly, aberration, outbreak, elevated cluster, critical resource area, etc. The declared need may be for monitoring, etiology, management, or early warning. The responsible factors may be natural, accidental, or intentional. This proofofconcept paper suggests methods and tools for hotspot detection across geographic regions and across networks. The investigation proposes development of statistical methods and tools that have immediate potential for use in critical societal areas, such as public health and disease surveillance, ecosystem health, water resources and water services, transportation networks, persistent poverty typologies and trajectories, environmental justice, biosurveillance and biosecurity, among others. We introduce, for multidisciplinary use, an innovation of the healthareapopular circlebased spatial and spatiotemporal scan statistic. Our innovation employs the notion of an upper level set, and is accordingly called the upper level set scan statistic, pointing to a sophisticated analytical and computational system as the next generation of the present day popular SaTScan. Success of surveillance rests on potential elevated cluster detection capability. But the clusters can be of any shape, and cannot be captured only by circles. This is likely to give more of false alarms and more of false sense of security. What we need is capability to detect arbitrarily shaped clusters. The proposed upper level set scan statistic innovation is expected to ®ll this need
Multiple Indicators, partially ordered sets, and linear extensions: Multicriterion ranking and prioritization
, 2004
"... ..."
Optimality and computations for relative surprise inferences
 Can. J. of Statist
, 2006
"... Relative surprise inferences are based on how beliefs change from a priori to a posteriori. These inferences can be seen to be based on the posterior distribution of the integrated likelihood and, as such, are invariant under relabellings of the parameter of interest. In this paper we demonstrate th ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
Relative surprise inferences are based on how beliefs change from a priori to a posteriori. These inferences can be seen to be based on the posterior distribution of the integrated likelihood and, as such, are invariant under relabellings of the parameter of interest. In this paper we demonstrate that relative surprise inferences possess an optimality property. Further, computational techniques are developed for implementing these inferences that are applicable whenever we have algorithms to sample from the prior and posterior distributions. 1
Monte Carlo conditioning on a sufficient statistic
, 2000
"... We derive general formulae for computation of conditional expectations of functions (X) given T = t, when (X; T) is a pair of random vectors such that T is sufficient compared to X for a parameter . The basic assumption is that there is a random vector U with a known distribution and functions ; s ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We derive general formulae for computation of conditional expectations of functions (X) given T = t, when (X; T) is a pair of random vectors such that T is sufficient compared to X for a parameter . The basic assumption is that there is a random vector U with a known distribution and functions ; such that (X; T) under the parameter value has the same distribution as the pair ((U; ); (U; )). The conditional expectations are then expressed in terms of ordinary expectations of functions of U, and are thus well suited for computation by Monte Carlo simulation. The clue is to equip the parameter space by a suitably chosen σfinite measure. The problem of direct sampling from conditional distributions of X given T = t is also considered. It is shown in particular that when the model satises a certain pivotal condition, then this can be done by sampling (U; ) where the value of is adjusted for each U in such a manner that (U; ) = t is kept fixed. Several examples are given in order to demonstrate different cases which may occur in practice.
Horizon problems and extreme events in financial risk management. Economic Policy Review, Federal Reserve Bank of
, 1998
"... There is no one “magic ” relevant horizon for risk management. Instead, the relevant horizon will generally vary by asset class (for example, equity versus bonds), industry (banking versus insurance), position in the firm (trading ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
There is no one “magic ” relevant horizon for risk management. Instead, the relevant horizon will generally vary by asset class (for example, equity versus bonds), industry (banking versus insurance), position in the firm (trading
Using urn models for the design of clinical trials
 Indian Journal of Statistics
"... SUMMARY. When conducting a clinical trial to compare K ≥ 2 treatments, it is essential to randomize patients to treatment in order to reduce experimental bias. Typically, an equal number of patients are assigned to each treatment group in order to increase the precision of inference concerning trea ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
SUMMARY. When conducting a clinical trial to compare K ≥ 2 treatments, it is essential to randomize patients to treatment in order to reduce experimental bias. Typically, an equal number of patients are assigned to each treatment group in order to increase the precision of inference concerning treatment effect. However, if the treatment difference is large and the endpoint potentially fatal, it seems unethical in an unmasked trial to sacrifice a large number of study patients for the purpose of maintaining balance in the number assigned to each treatment group. In this article, we survey those treatment randomization rules that are designed for use in each of the above two experimental situations, and have the empirical advantage of being well explained in terms of an urn model. 1.
Finite Sample Size Results for Robust Model Selection; Application to Neural Networks
 Technical Report, Dept. E.E., Technion, Publication # CC117
, 1995
"... The problem of model selection in the face of finite sample size is considered within the framework of statistical decision theory. Focusing on the special case of regression, we introduce a model selection criterion which is shown to be robust in the sense that, with high confidence, even for a fin ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The problem of model selection in the face of finite sample size is considered within the framework of statistical decision theory. Focusing on the special case of regression, we introduce a model selection criterion which is shown to be robust in the sense that, with high confidence, even for a finite sample size it selects the best model. Our derivation is based on uniform convergence methods, augmented by results from the theory of function approximation, which permit us to make definite probabilistic statements about the finite sample behavior. These results stand in contrast to classical approaches, which can only guarantee the asymptotic optimality of the choice. The criterion is demonstrated for the problem of model selection in feedforward neural networks. Index TermsStatistical model selection, uniform strong law of large numbers, structural risk minimization, lower and upper bounds on approximation error, algorithmic complexity, neural networks. 1 Introduction A major pr...