Results 1 - 10
of
29
The group Lasso for logistic regression
- Journal of the Royal Statistical Society, Series B
, 2008
"... Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regressi ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.
Beyond Greed and Grievance: Feasibility and Civil War
, 2006
"... A key distinction among theories of civil war is between those that are built upon motivation and those that are built upon feasibility. We analyze a comprehensive global sample of civil wars for the period 1965-2004 and subject the results to a range of robustness tests. The data constitute a subst ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
A key distinction among theories of civil war is between those that are built upon motivation and those that are built upon feasibility. We analyze a comprehensive global sample of civil wars for the period 1965-2004 and subject the results to a range of robustness tests. The data constitute a substantial advance on previous work. We find that variables that are close proxies for feasibility have powerful consequences for the risk of a civil war. Our results substantiate the 'feasibility hypothesis ' that where civil war is feasible it will occur without reference to motivation. 2 1.
Emergence of New Project Teams from Open Source Software Developer Networks: Impact of Prior Collaboration Ties
, 2006
"... Software development has traditionally been regarded as an activity that can only be effectively conducted and managed within a firm setting. However, contrary to such assertions, the open source software development (OSSD) approach, in which software developers in digital social networks coordinate ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Software development has traditionally been regarded as an activity that can only be effectively conducted and managed within a firm setting. However, contrary to such assertions, the open source software development (OSSD) approach, in which software developers in digital social networks coordinate to voluntarily contribute programming code, has recently emerged as a promising alternative. Although many high profile cases of successful OSSD projects exist, the harsh reality is that the vast majority of OSS projects fail to take off and become abandoned. A commonly cited reason for the failure of OSS projects is the inability of the software project to bring together a critical mass of developers. This paper empirically examines the role of prior collaborative ties on how OSSD project teams are formed. Using software project data from real world OSSD projects, we find that the existence and the amount of prior collaborative relations in the developer network do increase the probability that an OSS project will attract more developers and that a developer’s prior relationships with a project initiator do increase the likelihood that a developer will join a project initiated by a past collaborator. We also explore the performance implications of early team formation behaviors.
Infinitely imbalanced logistic regression
- The Journal of Machine Learning Research
"... In binary classification problems it is common for the two classes to be imbalanced: one case is very rare compared to the other. In this paper we consider the infinitely imbalanced case where one class has a finite sample size and the other class’s sample size grows without bound. For logistic regr ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In binary classification problems it is common for the two classes to be imbalanced: one case is very rare compared to the other. In this paper we consider the infinitely imbalanced case where one class has a finite sample size and the other class’s sample size grows without bound. For logistic regression, the infinitely imbalanced case often has a useful solution. Under mild conditions, the intercept diverges as expected, but the rest of the coefficient vector approaches a non trivial and useful limit. That limit can be expressed in terms of exponential tilting and is the minimum of a convex objective function. The limiting form of logistic regression suggests a computational shortcut for fraud detection problems.
Estimating Risk and Rate Levels, Ratios, and Differences in Case-Control Studies
, 2001
"... Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incide ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Classic (or "cumulative") case-control sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") case-control sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just risk and rate ratios, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of case-control study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.
Multinomial probit and multinomial logit: a comparison of choice models for voting research
, 2004
"... ..."
Communication (and Coordination?) in a Modern, Complex Organization.”Working paper N
, 2008
"... This is a descriptive study of the structure of communications in a modern organization. We analyze a dataset with millions of electronic mail messages, calendar meetings and teleconferences for many thousands of employees of a single, multidivisional firm during a three-month period in calendar 200 ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This is a descriptive study of the structure of communications in a modern organization. We analyze a dataset with millions of electronic mail messages, calendar meetings and teleconferences for many thousands of employees of a single, multidivisional firm during a three-month period in calendar 2006. The basic question we explore asks, what is the role of observable (to us) boundaries between individuals in structuring communications inside the firm? We measure three general types of boundaries: organizational boundaries (strategic business unit and function memberships), spatial boundaries (office locations and inter-office distances), and social categories (gender, tenure within the firm). In dyad-level models of the probability that pairs of individuals communicate, we find very large effects of formal organization structure and spatial collocation on the rate of communication. Homophily effects based on sociodemographic categories are much weaker. In individual-level regressions of engagement in category-spanning communication patterns, we find that women, mid- to highlevel executives, and members of the executive management, sales and marketing functions are most likely to participate in cross-group communications. In effect, these individuals bridge the lacunae between distant groups in the company’s social structure.- 2-Communication (and Coordination?) in a Modern, Complex Organization
Factors Leading to Integration Failures in Global FeatureOriented Development: An Empirical Analysis
- Presented at the 33rd International Conference on Software Engineering
, 2011
"... Feature-driven software development is a novel approach that has grown in popularity over the past decade. Researchers and practitioners alike have argued that numerous benefits could be garnered from adopting a feature-driven development approach. However, those persuasive arguments have not been m ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Feature-driven software development is a novel approach that has grown in popularity over the past decade. Researchers and practitioners alike have argued that numerous benefits could be garnered from adopting a feature-driven development approach. However, those persuasive arguments have not been matched with supporting empirical evidence. Moreover, developing software systems around features involves new technical and organizational elements that could have significant implications for outcomes such as software quality. This paper presents an empirical analysis of a large-scale project that implemented 1195 features in a software system. We examined the impact that technical attributes of product features, attributes of the feature teams and crossfeature interactions have on software integration failures. Our results show that technical factors such as the nature of component dependencies and organizational factors such as the geographic dispersion of the feature teams and the role of the feature owners had complementary impact suggesting their independent and important role in terms of software quality. Furthermore, our analyses revealed that cross-feature interactions, measured as the number of architectural dependencies between two product features, are a major driver of integration failures. The research and practical implications of our results are discussed.

