HeadDriven Statistical Models for Natural Language Parsing
, 1999
"... Mitch Marcus was a wonderful advisor. He gave consistently good advice, and allowed an ideal level of intellectual freedom in pursuing ideas and research topics. I would like to thank the members of my thesis committee Aravind Joshi, Mark Liberman, Fernando Pereira and Mark Steedman  for the remar ..."
Cited by 955 (16 self)
Mitch Marcus was a wonderful advisor. He gave consistently good advice, and allowed an ideal level of intellectual freedom in pursuing ideas and research topics. I would like to thank the members of my thesis committee Aravind Joshi, Mark Liberman, Fernando Pereira and Mark Steedman  for the remarkable breadth and depth of their feedback. I had countless impromptu but in uential discussions with Jason Eisner, Dan Melamed and Adwait Ratnaparkhi in the LINC lab. They also provided feedback on many drafts of papers and thesis chapters. Paola Merlo pushed me to think about many new angles of the research. Dimitrios Samaras gave invaluable feedback on many portions of the work. Thanks to James Brooks for his contribution to the work that comprises chapter 5 of this thesis. The community of faculty, students and visitors involved with the Institute for Research in Cognitive Science at Penn provided an intensely varied and stimulating environment. I would like to thank them collectively. Some deserve special mention for discussions that contributed quite directly to this research: Breck Baldwin, Srinivas Bangalore, Dan
The Psychophysiology of RealTime Financial Risk Processing
 JOURNAL OF COGNITIVE NEUROSCIENCE
, 2001
"... A longstanding controversy in economics and finance is whether financial markets are governed by rational forces or by emotional responses. We study the importance of emotion in the decisionmaking process of professional securities traders by measuring their physiological characteristics, e.g., skin ..."
Cited by 32 (7 self)
A longstanding controversy in economics and finance is whether financial markets are governed by rational forces or by emotional responses. We study the importance of emotion in the decisionmaking process of professional securities traders by measuring their physiological characteristics, e.g., skin conductance, blood volume pulse, etc., during live trading sessions while simultaneously capturing realtime prices from which market events can be detected. In a sample of 10 traders, we find statistically significant differences in mean electrodermal responses during transient market events relative to noevent control periods, and statistically significant mean changes in cardiovascular variables during periods of heightened market volatility relative to normalvolatility control periods. We also observe significant differences in these physiological response across the 10 traders which may be systematically related to the traders’ levels of experience.
The fuzzy correlation between code and performance predictability
 In Proceedings of the 37th International Symposium on Microarchitecture (MICRO
, 2004
"... Recent studies have shown that most SPEC CPU2K benchmarks exhibit strong phase behavior, and the Cycles per Instruction (CPI) performance metric can be accurately predicted based on program’s controlflow behavior, by simply observing the sequencing of the program counters, or extended instruction p ..."
Cited by 22 (1 self)
Recent studies have shown that most SPEC CPU2K benchmarks exhibit strong phase behavior, and the Cycles per Instruction (CPI) performance metric can be accurately predicted based on program’s controlflow behavior, by simply observing the sequencing of the program counters, or extended instruction pointers (EIPs). One motivation of this paper is to see if server workloads also exhibit such phase behavior. In particular, can EIPs effectively predict CPI in server workloads? We propose using regression trees to measure the theoretical upper bound on the accuracy of predicting the CPI using EIPs, where accuracy is measure by the explained variance of CPI with EIPs. Our results show that for most server workloads and, surprisingly, even for CPU2K benchmarks, the accuracy of predicting CPI from EIPs varies widely. We classify the benchmarks into four quadrants based on their CPI variance and predictability of CPI using EIPs. Our results indicate that no single sampling technique can be broadly applied to a large class of applications. We propose a new methodology that selects the bestsuited sampling technique to accurately capture the program behavior. 1.
A tutorial on conformal prediction
 Journal of Machine Learning Research
, 2008
"... Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ε, together with a method that makes a prediction ˆy of a label y, it produces a set of labels, typically containing ˆy, that also contains y with probability 1 − ε. Con ..."
Cited by 18 (1 self)
Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ε, together with a method that makes a prediction ˆy of a label y, it produces a set of labels, typically containing ˆy, that also contains y with probability 1 − ε. Conformal prediction can be applied to any method for producing ˆy: a nearestneighbor method, a supportvector machine, ridge regression, etc. Conformal prediction is designed for an online setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1 − ε of the time, even though they are based on an accumulating data set rather than on independent data sets. In addition to the model under which successive examples are sampled independently, other online compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. This tutorial presents a selfcontained account of the theory of conformal prediction and works through several numerical examples. A more comprehensive treatment of the topic is provided in
Wavelets, Ridgelets, and Curvelets for Poisson Noise Removal
"... Abstract—In order to denoise Poisson count data, we introduce a variance stabilizing transform (VST) applied on a filtered discrete Poisson process, yielding a near Gaussian process with asymptotic constant variance. This new transform, which can be deemed as an extension of the Anscombe transform t ..."
Cited by 17 (1 self)
Abstract—In order to denoise Poisson count data, we introduce a variance stabilizing transform (VST) applied on a filtered discrete Poisson process, yielding a near Gaussian process with asymptotic constant variance. This new transform, which can be deemed as an extension of the Anscombe transform to filtered data, is simple, fast, and efficient in (very) lowcount situations. We combine this VST with the filter banks of wavelets, ridgelets and curvelets, leading to multiscale VSTs (MSVSTs) and nonlinear decomposition schemes. By doing so, the noisecontaminated coefficients of these MSVSTmodified transforms are asymptotically normally distributed with known variances. A classical hypothesistesting framework is adopted to detect the significant coefficients, and a sparsitydriven iterative scheme reconstructs properly the final estimate. A range of examples show the power of this MSVST approach for recovering important structures of various morphologies in (very) lowcount images. These results also demonstrate that the MSVST approach is competitive relative to many existing denoising methods. Index Terms—Curvelets, filtered Poisson process, multiscale variance stabilizing transform, Poisson intensity estimation, ridgelets, wavelets. I.
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Cited by 17 (6 self)
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Network constrained clustering for gene microarray data
 Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/bti655 ..."
Suboptimality of penalized empirical risk minimization in classification
 In Proceedings of the 20th annual conference on Computational Learning Theory (COLT). Lecture Notes in Computer Science 4539 142–156
, 2007
"... Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss func ..."
Cited by 11 (3 self)
Abstract. Let F be a set of M classification procedures with values in [−1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((log M)/n) 1/2 or (log M)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation. 1
NonGaussian conditional linear AR(1) models
 Australian and New Zealand Journal of Statistics
, 2000
"... Abstract: We give a general formulation of a nonGaussian conditional linear AR(1) model subsuming most of the nonGaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and con ..."
Cited by 11 (3 self)
Abstract: We give a general formulation of a nonGaussian conditional linear AR(1) model subsuming most of the nonGaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and conditions for stationarity. These results highlight similarities and differences with the Gaussian AR(1) model, and unify many separate results appearing in the literature. Examples illustrate the wide range of properties that can appear under the conditional linear autoregressive assumption. These results are used in analysing three real data sets, illustrating general methods of estimation, model diagnostics and model selection. In particular, we show that the theoretical results can be used to develop diagnostics for deciding if a time series can be modelled by some linear autoregressive model, and for selecting among several candidate models.
Statistical regionbased active contours with exponential family observations
 in ICASSP, 2006
"... In this paper, we focus on statistical regionbased active contour models where image features (e.g. intensity) are random variables whose distribution belongs to some parametric family (e.g. exponential) rather than confining ourselves to the special Gaussian case. In the framework developed in thi ..."
Cited by 11 (2 self)
In this paper, we focus on statistical regionbased active contour models where image features (e.g. intensity) are random variables whose distribution belongs to some parametric family (e.g. exponential) rather than confining ourselves to the special Gaussian case. In the framework developed in this paper, we consider the general case of regionbased terms involving functions of parametric probability densities, for which the antiloglikelihood function is a special case. Using shape derivative tools, our effort focuses on constructing a general expression for the derivative of the energy (with respect to a domain), and on deriving the corresponding evolution speed. More precisely, we first show by a counterexample that the estimator of the distribution parameters is crucial for the derived speed expression. On the one hand, when using the maximum likelihood (ML) estimator for these parameters, the evolution speed has a closedform expression that depends simply on the probability density function. On the other hand, complicating additive terms appear when using other estimators, e.g. method of moments. We then proceed by stating a general result within the framework of multiparameter exponential family. This result is specialized to the case of the antiloglikelihood score with the ML estimator and to the case of the relative entropy. Experimental