Results 1 - 10
of
16
Network-based marketing: Identifying likely adopters via consumer networks
- Statistical Science
"... Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on su ..."
Abstract
-
Cited by 48 (10 self)
- Add to MetaCart
Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on such marketing with an emphasis on the statistical methods used and the data to which these methods have been applied. We also provide a discussion of challenges and opportunities for this burgeoning research topic. Our survey highlights a gap in the literature. Because of inadequate data, prior studies have not been able to provide direct, statistical support for the hypothesis that network linkage can directly affect product/service adoption. Using a new data set that represents the adoption of a new telecommunications service, we show very strong support for the hypothesis. Specifically, we show three main results: (1) “Network neighbors”—those consumers linked to a prior customer—adopt the service at a rate 3–5 times greater than baseline groups selected by the best practices of the firm’s marketing team. In addition, analyzing the network allows the firm to acquire new customers who otherwise would have fallen through the cracks, because they would not have been identified based on traditional attributes. (2) Statistical models, built with a very large amount of geographic, demographic and prior purchase data, are significantly and substantially improved by including network information. (3) More detailed network information allows the ranking of the network neighbors so as to permit the selection of small sets of individuals with very high probabilities of adoption. Key words and phrases: Viral marketing, word of mouth, targeted marketing, network analysis, classification, statistical relational learning. 1.
Collective classification in network data
, 2008
"... Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification te ..."
Abstract
-
Cited by 45 (17 self)
- Add to MetaCart
Numerous real-world applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and real-world data. 1
STATISTICAL MODELS AND ANALYSIS TECHNIQUES FOR LEARNING IN RELATIONAL DATA
, 2006
"... Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and
thereby decision-making, if machine learning techniques can effectively exploit the relational information.
This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason
about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable.
We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this
framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In
particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across
different types of relational data.
NetKit-SRL: A Toolkit for Network Learning and Inference -- and its use for classification of networked data
- PROC. ANN. CONF. NORTH AM. ASSOC. COMPUTATIONAL SOCIAL AND ORGANIZATIONAL SCIENCE (NAACSOS
, 2005
"... This paper describes NetKit-SRL, or NetKit for short, a toolkit for learning from and classifying networked data. The toolkit is open-source and publicly available. It is modular and built for ease of plug-and-play---such that it is easy to add new modules and have them interact with other existing ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes NetKit-SRL, or NetKit for short, a toolkit for learning from and classifying networked data. The toolkit is open-source and publicly available. It is modular and built for ease of plug-and-play---such that it is easy to add new modules and have them interact with other existing modules. Currently available NetKit modules are focused on "batch" within-network learning and classification: given a partially labeled network, where all nodes and edges are already known to exist, estimate the class membership probability of the unlabeled nodes in the network. NetKit has been used in various network domains such as websites, citation graphs, movies and social networks.
Evaluating Statistical Tests for Within-Network Classifiers of Relational Data
"... Recently a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Recently a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More specifically, the complex link structure and attribute dependencies in network data violate the assumptions of many conventional statistical tests and make it difficult to use these tests to assess the models in an unbiased manner. In this work, we examine the task of within-network classification and the question of whether two algorithms will learn models which will result in significantly different levels of performance. We show that the commonly-used form of evaluation (paired t-test on overlapping network samples) can result in an unacceptable level of Type I error. Furthermore we show that Type I error increases as (1) the correlation among instances increases and (2) the size of the evaluation set increases (i.e., the proportion of labeled nodes in the network decreases). We propose a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error while still providing reasonable levels of statistical power (i.e., Type II error). 1.
A brief survey of machine learning methods for classification in networked data and an application to suspicion scoring
, 2006
"... ..."
A shrinkage approach for modeling non-stationary relational autocorrelation
- In ICDM ’08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. IEEE Computer Society
"... Recent research has shown that collective classification in relational data often exhibit significant performance gains over conventional approaches that classify instances individually. This is primarily due to the presence of autocorrelation in relational datasets, which means that the class label ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Recent research has shown that collective classification in relational data often exhibit significant performance gains over conventional approaches that classify instances individually. This is primarily due to the presence of autocorrelation in relational datasets, which means that the class label of related entities are correlated and inferences about one instance can be used to improve inferences about linked instances. Statistical relational learning techniques exploit relational autocorrelation by modeling global autocorrelation dependencies under the assumption that the level of autocorrelation is stationary throughout the dataset. To date, there has been no work examining the appropriateness of this stationarity assumption. In this paper, we examine two real-world datasets and show that there is significant variance in the autocorrelation dependencies throughout the relational data graphs. To account for this, we develop a technique for modeling non-stationary autocorrelation in relational data. We compare to two baseline techniques which model either the local or the global autocorrelation dependencies in isolation and show that a shrinkage model results in significantly improved model accuracy. 1.
Classification in networked data
, 2006
"... This paper 1 is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning rese ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper 1 is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.
Model Learning from Published Aggregated Data” Learning Structure and Schemas from Documents, M. Biba and F. Xhafa (Eds
, 2011
"... Abstract In many application domains, particularly in healthcare, an access for individual datapoints is limited, while data aggregated in form of means and standard deviations are widely available. This limitation is a result of many factors, including privacy laws that prevent clinicians and scien ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract In many application domains, particularly in healthcare, an access for individual datapoints is limited, while data aggregated in form of means and standard deviations are widely available. This limitation is a result of many factors, including privacy laws that prevent clinicians and scientists from freely sharing individual patient data, inability to share proprietary business data, and inadequate data collection methods. Consequently, it prevents the use of the traditional machine learning methods for model construction. The problem is especially important if a study involves comparisons of multiple datasets, where each is derived from different open-access publications where data are represented in an aggregated form. This chapter describes the problem of machine learning of models from aggregated data as compared to traditional learning from individual examples. It presents a method of rule induction from such data as well as an application of this method to constructing of the predictive models for diagnosing liver complications of the metabolic syndrome – one of the most common chronic diseases in humans. Other possible applications of the method are also discussed. 1
Relational Learning for Customer Relationship Management
"... Customer modeling is a critical component of customer relationship management (CRM). Successful customer modeling requires a holistic view and the consolidation of all customer information available to the business, which is typically stored in a relational database. With this understanding, cus ..."
Abstract
- Add to MetaCart
Customer modeling is a critical component of customer relationship management (CRM). Successful customer modeling requires a holistic view and the consolidation of all customer information available to the business, which is typically stored in a relational database. With this understanding, customer modeling in CRM can be viewed as a special case of the relational learning problem, a recent extension of the traditional machine learning problem that aims to model the relational interdependencies within a database containing multiple interlinked tables.

