Results 11  20
of
596
A tutorial on MM algorithms
 Amer. Statist
, 2004
"... Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the loglikelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to part of the standard toolkit of professional statisticians. The current article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation. Key words and phrases: constrained optimization, EM algorithm, majorization, minorization, NewtonRaphson 1 1
Statistical relational learning for link prediction
 In Proceedings of the Workshop on Learning Statistical Models from Relational Data at IJCAI2003
, 2003
"... Link prediction is a complex, inherently relational, task. Be it in the domain of scientific citations, social networks or hypertext links, the underlying data are extremely noisy and the characteristics useful for prediction are not readily available in a “flat ” file format, but rather involve com ..."
Abstract

Cited by 63 (5 self)
 Add to MetaCart
Link prediction is a complex, inherently relational, task. Be it in the domain of scientific citations, social networks or hypertext links, the underlying data are extremely noisy and the characteristics useful for prediction are not readily available in a “flat ” file format, but rather involve complex relationships among objects. In this paper, we propose the application of our methodology for Statistical Relational Learning to building link prediction models. We propose an integrated approach to building regression models from data stored in relational databases in which potential predictors are generated by structured search of the space of queries to the database, and then tested for inclusion in a logistic regression. We present experimental results for the task of predicting citations made in scientific literature using relational data taken from CiteSeer. This data includes the citation graph, authorship and publication venues of papers, as well as their word content. 1
Tree induction vs. logistic regression: A learningcurve analysis
 CEDER WORKING PAPER #IS0102, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership pr ..."
Abstract

Cited by 62 (16 self)
 Add to MetaCart
Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learningcurve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about inductionalgorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probabilitybased rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signaltonoise ratio.
Predictive Model Selection
 Journal of the Royal Statistical Society, Ser. B
, 1995
"... this article we propose three criteria that can be used to address model selection. These emphasize observables rather than parameters and are based on a certain Bayesian predictive density. They have a unifying basis that is simple and interpretable,are free of asymptotic de#nitions,and allow the i ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
this article we propose three criteria that can be used to address model selection. These emphasize observables rather than parameters and are based on a certain Bayesian predictive density. They have a unifying basis that is simple and interpretable,are free of asymptotic de#nitions,and allow the incorporation of prior information. Moreover,two of these criteria are readily calibrated.
Using Coupling Measurement for Impact Analysis in ObjectOriented Systems
, 1999
"... Many coupling measures have been proposed in the context of objectoriented (OO) systems. In addition, several studies have highlighted the complexity of using dependency analysis in OO software to perform impact analysis. The question is then: can we use simple decision models based on coupling mea ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
Many coupling measures have been proposed in the context of objectoriented (OO) systems. In addition, several studies have highlighted the complexity of using dependency analysis in OO software to perform impact analysis. The question is then: can we use simple decision models based on coupling measurement to support impact analysis in OO systems? Such an approach has for main advantage its simplicity and complete automation. To investigate this question, we perform here a thorough analysis on a commercial C++ system where change data has been collected over several years. We identify the coupling dimensions that seem to be significantly related to ripple effects and use them to rank classes according to their probability of containing ripple effects. We then assess the expected effectiveness of such decision models. Keywords Coupling, metrics, measurement, impact analysis, object oriented. 1 Introduction A claimed benefit of objectoriented (OO) modeling approaches is that they ...
SURVIVAL OF BUSINESSES USING COLLABORATIVE RELATIONSHIPS TO COMMERCIALIZE COMPLEX GOODS
, 1996
"... Authors with many theoretical and managerial perspectives argue that businesses commercializing technologically complex goods benefit when they collaborate closely with other businesses. Collaboration is viewed as a means for businesses to overcome competency limitations and to achieve the close con ..."
Abstract

Cited by 46 (14 self)
 Add to MetaCart
Authors with many theoretical and managerial perspectives argue that businesses commercializing technologically complex goods benefit when they collaborate closely with other businesses. Collaboration is viewed as a means for businesses to overcome competency limitations and to achieve the close configuration of components required for complex goods. We predict that collaborative relationships ofen assist businesses to produce complex goods, but that the relationships might also cause problems for the collaborating businesses. We find that firms using developmentoriented and marketingoriented collaborative relationships in the hospital sofhvare systems industry are less likely to shut down than businesses that follow independent approaches when the environment changes gradually, but businesses using collaborative relationships are sometimes susceptible to being acquired by other firms. Following a sudden environmental shock, businesses with collaborative relationships for activities central to the shock became more likely to shut down, while businesses with collaborative relationships for activities outside the focus of the shock became more likely to survive. The study critically evaluates and tests the widely stated but littletested argument that interfirm collaboration is usually beneficial. The results address the issue of whether organizational choices affect comparative business performance. This paper investigates the survival of businesses that use collaborative relationships with other firms to commercialize complex goods. A growing literature has identified many benefits of interfirm collaboration. Several recent studies argue that businesses that collaborate closely with other organizations in order to develop and market complex goods will be more successful than businesses that operate independently (Jorde and
The Prediction of Faulty Classes Using ObjectOriented Design Metrics
, 1999
"... Contemporary evidence suggests that most field faults in software applications are found in a smafi percentage of the software's components. This means that if these faulty software components can be detected early in the development project's life cycle, mitigating actions can be taken, such as a ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
Contemporary evidence suggests that most field faults in software applications are found in a smafi percentage of the software's components. This means that if these faulty software components can be detected early in the development project's life cycle, mitigating actions can be taken, such as a redesign. For objectoriented applications, prediction models using design metrics can be used to identify faulty classes early on. In this paper we report on a study that used objectoriented design metrics to construct such prediction models. The study used data collected from one version of a commercial Java application for constructing a prediction model. The model was then validated on a subsequent release of the same application. Our results indicate that the prediction model has a high accuracy. Furthermore, we found that an export coupling metric had the strongest association with faultproneness, indicating a structural feature that may be symptomatic of a class with a high probability of latent faults.
Inferring probability of relevance using the method of logistic regression
 In Proceedings of ACM SIGIR’94
, 1994
"... This research evaluates a model for probabilistic text and document retrieval; the model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties. Since the model infers probability of relevance ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
This research evaluates a model for probabilistic text and document retrieval; the model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties. Since the model infers probability of relevance from statistical clues present in the texts of documents and queries, we call it logistic inference. By transforming the distribution of each statistical clue into its standardized distribution (one with mean v = O and standard deviation a = 1), the method allows one to apply logistic coefficients derived from a training collection to other document collections, with little loss of predictive power. The model is applied to three wellknown information retrieval test collections, and the results are compared directly to the particular vector space model of retrieval which uses termfrequency/inversedocumentfrequency (tfidf) weighting and the cosine similarity measure. In the comparison, the logistic inference method performs significantly better than (in two collections) or equally well as (in the third collection) the tfidf/cosine vector space model. The differences in performances of the two models were subjected to statistical tests to see if the differences are statistically significant or could have occurred by chance. 1.
Spatial Contextual Classification and Prediction Models for Mining Geospatial Data
 IEEE Transactions on Multimedia
, 2002
"... Modeling spatial context (e.g., autocorrelation) is a key challenge in classification problems that arise in geospatial domains. Markov Random Fields (MRFs) is a popular model for incorporating spatial context into image segmentation and landuse classification problems. The spatial autoregression ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
Modeling spatial context (e.g., autocorrelation) is a key challenge in classification problems that arise in geospatial domains. Markov Random Fields (MRFs) is a popular model for incorporating spatial context into image segmentation and landuse classification problems. The spatial autoregression model (SAR) which is an extension of the classical regression model for incorporating spatial dependence, is popular for prediction and classification of spatial data in regional economics, natural resources, and ecological studies. There is little literature comparing these alternative approaches to facilitate the exchange of ideas (e.g., solution procedures). We argue that the SAR model makes more restrictive assumptions about the distribution of feature values and class boundaries than MRF. The relationship between SAR and MRF is analogous to the relationship between regression and Bayesian classifiers. This paper provides comparisons between the two models using a probabilistic and an experimental framework.