Results 21  30
of
133
A sum of squares approximation of nonnegative polynomials
 SIAM J. Optim
, 2006
"... Abstract. We show that every real nonnegative polynomial f can be approximated as closely as desired (in the l1norm of its coefficient vector) by a sequence of polynomials {fɛ} that are sums of squares. The novelty is that each fɛ has a simple and explicit form in terms of f and ɛ. Key words. Real ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Abstract. We show that every real nonnegative polynomial f can be approximated as closely as desired (in the l1norm of its coefficient vector) by a sequence of polynomials {fɛ} that are sums of squares. The novelty is that each fɛ has a simple and explicit form in terms of f and ɛ. Key words. Real algebraic geometry; positive polynomials; sum of squares; semidefinite programming. AMS subject classifications. 12E05, 12Y05, 90C22 1. Introduction. The
Characterizing the Generalization Performance of Model Selection Strategies
 In ICML97
, 1997
"... : We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding o ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
: We investigate the structure of model selection problems via the bias/variance decomposition. In particular, we characterize the essential structure of a model selection task by the bias and variance profiles it generates over the sequence of hypothesis classes. This leads to a new understanding of complexitypenalization methods: First, the penalty terms in effect postulate a particular profile for the variances as a function of model complexity if the postulated and true profiles do not match, then systematic underfitting or overfitting results, depending on whether the penalty terms are too large or too small. Second, it is usually best to penalize according to the true variances of the task, and therefore no fixed penalization strategy is optimal across all problems. We then use this bias/variance characterization to identify the notion of easy and hard model selection problems. In particular, we show that if the variance profile grows too rapidly in relation to the biases t...
Continuous Stochastic Logic Characterizes Bisimulation of Continuoustime Markov Processes
 J. of Logic and Alg. Progr
, 2002
"... In a recent paper Baier, Haverkort, Hermanns and Katoen [BHHK00], analyzed a new way of modelchecking formulas of a logic for continuoustime processes  called Continuous Stochastic Logic (henceforth CSL) { against continuoustime Markov chains { henceforth CTMCs. One of the important results o ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
In a recent paper Baier, Haverkort, Hermanns and Katoen [BHHK00], analyzed a new way of modelchecking formulas of a logic for continuoustime processes  called Continuous Stochastic Logic (henceforth CSL) { against continuoustime Markov chains { henceforth CTMCs. One of the important results of that paper was the proof that if two CTMCs were bisimilar then they would satisfy exactly the same formulas of CSL. This raises the converse question { does satisfaction of the same collection of CSL formulas imply bisimilarity? In other words, given two CTMCs which are known to satisfy exactly the same formulas of CSL does it have to be the case that they are bisimilar? We prove that the answer to the question just raised is \yes". In fact we prove a signi cant extension, namely that a subset of CSL suces even for systems where the statespace may be a continuum. Along the way we prove a result to the eect that the set of Zeno paths has measure zero provided that the transition rates are bounded.
Consistency of Minimizers and the SLLN for Stochastic Programs
, 1996
"... A general strong law of large numbers for stochastic programs is established. It is shown that solutions and approximate solutions may not be consistent with the strong law in general, but consistency holds locally, or when the decision space is compact. An additional integrability condition implies ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
A general strong law of large numbers for stochastic programs is established. It is shown that solutions and approximate solutions may not be consistent with the strong law in general, but consistency holds locally, or when the decision space is compact. An additional integrability condition implies the uniform consistency of approximate solutions. The results are applied in the context of linear recourse models.  2  1. Introduction The paper examines relations between solutions of a stochastic optimization problem, and the solutions of large sampled versions of the problem. We consider an abstract stochastic program of the form () minimize x2X E P (d) \Gamma f(x; ) \Delta where E P (d) is the expectation operator with respect to the probability measure P over the space \Xi of random elements. The decision space here is taken as a metric space. For a given sequence 1 ; : : : ; n of realizations of the random variable we form the deterministic problem () minimize x2X 1...
Lectures on Young Measure Theory and its Applications in Economics
 Rend. Istit. Mat. Univ. Trieste
, 1998
"... this paper we work with the following hypothesis: ..."
Conjoint Probabilistic Subband Modeling
 MASSACHUSETTS INSTITUTE OF TECHNOLOGY
, 1997
"... A new approach to highorderconditional probability density estimation is developed, based on a partitioning of conditioning space via decision trees. The technique is applied to image compression, image restoration, and texture synthesis, and the results compared with those obtained by standard mi ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
A new approach to highorderconditional probability density estimation is developed, based on a partitioning of conditioning space via decision trees. The technique is applied to image compression, image restoration, and texture synthesis, and the results compared with those obtained by standard mixture density and linear regression models. By applying the technique to subbanddomain processing, some evidence is provided to support the following statement: the appropriate tradeoff between spatial and spectral localization in linear preprocessing shifts towards greater spatial localization when subbands are processed in a way that exploits interdependence.
On Source Coding with SideInformationDependent Distortion Measures
 IEEE TRANS. INFORM. THEORY
, 2000
"... Highresolution bounds in lossy coding of a real memoryless source are considered when side information is present. Let be a "smooth" source and let be the side information. First we treat the case when both the encoder and the decoder have access to and we establish an asymptotically tight (highre ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Highresolution bounds in lossy coding of a real memoryless source are considered when side information is present. Let be a "smooth" source and let be the side information. First we treat the case when both the encoder and the decoder have access to and we establish an asymptotically tight (highresolution) formula for the conditional ratedistortion function ( ) for a class of locally quadratic distortion measures which may be functions of the side information. We then consider the case when only the decoder has access to the side information (i.e., the "WynerZiv problem"). For sideinformationdependent distortion measures, we give an explicit formula which tightly approximates the WynerZiv ratedistortion function ( ) for small under some assumptions on the joint distribution of and . These results demonstrate that for sideinformationdependent distortion measures the rate loss ( ) ( ) can be bounded away from zero in the limit of small . This contrasts the case of distortion measures which do not depend on the side information where the rate loss vanishes as 0.
Principal Curves: Learning, Design, And Applications
, 1999
"... The subjects of this thesis are unsupervised learning in general, and principal curves in particular. Principal curves were originally defined by Hastie \cite{Has84} and Hastie and Stuetzle \cite{HaSt89} (hereafter HS) to formally capture the notion of a smooth curve passing through the ``middle'' o ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
The subjects of this thesis are unsupervised learning in general, and principal curves in particular. Principal curves were originally defined by Hastie \cite{Has84} and Hastie and Stuetzle \cite{HaSt89} (hereafter HS) to formally capture the notion of a smooth curve passing through the ``middle'' of a $d$dimensional probability distribution or data cloud. Based on the definition, HS also developed an algorithm for constructing principal curves of distributions and data sets. The field has been very active since Hastie and Stuetzle's groundbreaking work. Numerous alternative definitions and methods for estimating principal curves have been proposed, and principal curves were further analyzed and compared with other unsupervised learning techniques. Several applications in various areas including image analysis, feature extraction, and speech processing demonstrated that principal curves are not only of theoretical interest, but they also have a legitimate place in the family of practical unsupervised learning techniques. Although the concept of principal curves as considered by HS has several appealing characteristics, complete theoretical analysis of the model seems to be rather hard. This motivated us to redefine principal curves in a manner that allowed us to carry out extensive theoretical analysis while preserving the informal notion of principal curves. Our first contribution to the area is, hence, a new {\em theoretical model} that is analyzed by using tools of statistical learning theory. Our main result here is the first known consistency proof of a principal curve estimation scheme. The theoretical model proved to be too restrictive to be practical. However, it inspired the design of a new {\em practical algorithm} to estimate principal curves based on data. The polygonal line algorithm, which compares favorably with previous methods both in terms of performance and computational complexity, is our second contribution to the area of principal curves. To complete the picture, in the last part of the thesis we consider an {\em application} of the polygonal line algorithm to handwritten character skeletonization.
Sequential PAC Learning
 In Proceedigs of COLT95
, 1995
"... We consider the use of "online" stopping rules to reduce the number of training examples needed to paclearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a priori, the idea is instead to observe training examples oneatatime and decid ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
We consider the use of "online" stopping rules to reduce the number of training examples needed to paclearn. Rather than collect a large training sample that can be proved sufficient to eliminate all bad hypotheses a priori, the idea is instead to observe training examples oneatatime and decide "online" whether to stop and return a hypothesis, or continue training. The primary benefit of this approach is that we can detect when a hypothesizer has actually "converged," and halt training before the standard fixedsamplesize bounds. This paper presents a series of such sequential learning procedures for: distributionfree paclearning, "mistakebounded to pac" conversion, and distributionspecific paclearning, respectively. We analyze the worst case expected training sample size of these procedures, and show that this is often smaller than existing fixed sample size bounds  while providing the exact same worst case pacguarantees. We also provide lower bounds that show these r...