Results 1  10
of
38
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 153 (6 self)
 Add to MetaCart
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria
 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 2006
"... The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systemati ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults, thus facilitating the statistical analysis of fault detection effectiveness of test suites; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. Focusing on four common control and data flow criteria (Block, Decision, CUse, and PUse), this paper investigates this important issue based on a middle size industrial program with a comprehensive pool of test cases and known faults. Based on the data available thus far, the results are very consistent across the investigated criteria as they show that the use of mutation operators is yielding trustworthy results: Generated mutants can be used to predict the detection effectiveness of real faults. Applying such a mutation analysis, we then investigate the relative cost and effectiveness of the abovementioned criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage. Although such questions have been partially investigated in previous studies, we can use a large number of mutants, which helps decrease the impact of random variation in our analysis and allows us to use a different analysis approach. Our results are then compared with published studies, plausible reasons for the differences are provided, and the research leads us to suggest a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment.
A tutorial on conformal prediction
 Journal of Machine Learning Research
, 2008
"... Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ε, together with a method that makes a prediction ˆy of a label y, it produces a set of labels, typically containing ˆy, that also contains y with probability 1 − ε. Con ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ε, together with a method that makes a prediction ˆy of a label y, it produces a set of labels, typically containing ˆy, that also contains y with probability 1 − ε. Conformal prediction can be applied to any method for producing ˆy: a nearestneighbor method, a supportvector machine, ridge regression, etc. Conformal prediction is designed for an online setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1 − ε of the time, even though they are based on an accumulating data set rather than on independent data sets. In addition to the model under which successive examples are sampled independently, other online compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. This tutorial presents a selfcontained account of the theory of conformal prediction and works through several numerical examples. A more comprehensive treatment of the topic is provided in
Principal component estimation of functional logistic regression: discussion of two different approaches
 Journal of Nonparametric Statistics
, 2004
"... Over the last few years many methods have been developed for analyzing functional data with different objectives. The purpose of this paper is to predict a binary response variable in terms of a functional variable whose sample information is given by a set of curves measured without error. In order ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Over the last few years many methods have been developed for analyzing functional data with different objectives. The purpose of this paper is to predict a binary response variable in terms of a functional variable whose sample information is given by a set of curves measured without error. In order to solve this problem we formulate a functional logistic regression model and propose its estimation by approximating the sample paths in a finite dimensional space generated by a basis. Then, the problem is reduced to a multiple logistic regression model with highly correlated covariates. In order to reduce dimension and to avoid multicollinearity, two different approaches of functional principal component analysis of the sample paths are proposed. Finally, a simulation study for evaluating the estimating performance of the proposed principal component approaches is developed.
Model selection in electromagnetic source analysis with an application to VEF’s
 IEEE Transactions on Biomedical Engineering
, 2002
"... Abstract — In electromagnetic source analysis it is necessary to determine how many sources are required to describe the EEG or MEG adequately. Model selection procedures (MSP’s, or goodness of fit procedures) give an estimate of the required number of sources. Existing and new MSP’s are evaluated i ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Abstract — In electromagnetic source analysis it is necessary to determine how many sources are required to describe the EEG or MEG adequately. Model selection procedures (MSP’s, or goodness of fit procedures) give an estimate of the required number of sources. Existing and new MSP’s are evaluated in different source and noise settings: two sources which are close or distant, and noise which is uncorrelated or correlated. The commonly used MSP residual variance is seen to be ineffective, that is it often selects too many sources. Alternatives like the adjusted Hotelling’s test, Bayes information criterion, and the Wald test on source amplitudes are seen to be effective. The adjusted Hotelling’s test is recommended if a conservative approach is taken, and MSP’s such as Bayes information criterion or the Wald test on source amplitudes are recommended if a more liberal approach is desirable. The MSP’s are applied to empirical data (visual evoked fields). I.
On Measures of Explained Variance in Nonrecursive Structural Equation Models
 BAYESIAN MEASURES IN MULTILEVEL MODELS 251
, 1998
"... Following up and correcting prior work by Teel, Bearden, and Sharma (1986) in this journal, a general approach to variance explained in latent dependent variables of nonrecursive linear structural equation models is given. A new method of its estimation, easily implemented in EQS or LISREL, is descr ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Following up and correcting prior work by Teel, Bearden, and Sharma (1986) in this journal, a general approach to variance explained in latent dependent variables of nonrecursive linear structural equation models is given. A new method of its estimation, easily implemented in EQS or LISREL, is described and illustrated. Introduction It is imperative in marketing and behavioral and management research to summarize the ability of the variables in a model to explain critical dependent variables. In singleequation regression, multivariate regression, and multipleequation structural equation models, in which one or more dependent variables are predicted from a set of predictors, common practice is to summarize the predictability of a dependent variable with the squared multiple correlation R. Jain (1994) provides a good introduction to # regression analysis in marketing, and the R in this context. While alternative measures # are available, this is a basic and convenient summary measu...
Quantifying Persistence in ENSO
 J. Atmos. Sci
, 1999
"... The seasonal dependence of predictability in ENSO manifests itself in the socalled spring barrier found in the cyclostationary lag autocorrelations, or persistence. This work examines the statistics of persistence, with particular focus on the phaseofyeardependent pattern found in ENSO data, t ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The seasonal dependence of predictability in ENSO manifests itself in the socalled spring barrier found in the cyclostationary lag autocorrelations, or persistence. This work examines the statistics of persistence, with particular focus on the phaseofyeardependent pattern found in ENSO data, the barrier. Simple time series of one sine wave produce a barrier if the frequency is a biennial cycle or one of its harmonics. Time series of two sine waves produce a barrier if one frequency is a biennial cycle or a harmonic thereof. They additionally produce a barrier if their frequencies sum to unity. Time series with continuous but narrow spectral peaks at barrierproducing frequencies produce barriers only if the phase angles vary slowly or coherently across the peaks. The shape of the barrier seen in these simple time series is used to construct a model persistence map, which is a combination of an idealized barrier and the persistence of a rednoise process. A nonlinear least ...
Explaining the Cost of European Space and Military Projects
, 1999
"... There has been much controversy in the literature on several issues underlying the construction of parametric software development cost models. For example, it has been argued whether (dis)economies of scale exist in software production, what functional form should be assumed between effort and prod ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
There has been much controversy in the literature on several issues underlying the construction of parametric software development cost models. For example, it has been argued whether (dis)economies of scale exist in software production, what functional form should be assumed between effort and product size, whether COCOMO factors were useful, and whether the COCOMO factors are independent. Answers to such questions should help software organizations define suitable data collection programs and wellspecified cost models. The only way to address these issues and obtain a generalizable conclusion is to investigate them on a large number of consistent data sets. In this paper we use a data set collected by the European Space Agency to perform such an investigation. To ensure a certain degree of consistency in our data, we focus our analysis on a set of space and military projects that represent an important application domain and the largest subset in the database. These projects have be...
Case study of a functional genomics application for an FPGAbased coprocessor
 in Proceedings of Field Programmable Logic and Applications, 2003
, 2003
"... Note: This article has been accepted for publication in the journal Microprocessors and Microsystems ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Note: This article has been accepted for publication in the journal Microprocessors and Microsystems
Measure A
 Power Amplifier’s FifthOrder Interception Point’, RF Design
, 1999
"... Peertopeer streaming has emerged as a killer application in today’s Internet, delivering a large variety of live multimedia content to millions of users at any given time with low server cost. Though successfully deployed, the efficiency and optimality of the current peertopeer streaming protoco ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Peertopeer streaming has emerged as a killer application in today’s Internet, delivering a large variety of live multimedia content to millions of users at any given time with low server cost. Though successfully deployed, the efficiency and optimality of the current peertopeer streaming protocols are still less than satisfactory. In this thesis, we investigate optimizing solutions to enhance the performance of the stateoftheart meshbased peertopeer streaming systems, utilizing both theoretical performance modeling and extensive realworld measurements. First, we model peertopeer streaming applications in both the singleoverlay and multioverlay scenarios, based on the solid foundation of optimization and game theories. Using these models, we design efficient and fully decentralized solutions to achieve performance optimization in peertopeer streaming. Then, based on a large volume of live measurements from a commercial largescale peertopeer streaming application, we extensively study the realworld performance of peertopeer streaming over a long period of time. Highlights of our measurement study include ii the topological characterization of largescale streaming meshes, the statistical characterization