Results 1  10
of
82
Learning systems of concepts with an infinite relational model
 In Proceedings of the 21st National Conference on Artificial Intelligence
, 2006
"... Relationships between concepts account for a large proportion of semantic knowledge. We present a nonparametric Bayesian model that discovers systems of related concepts. Given data involving several sets of entities, our model discovers the kinds of entities in each set and the relations between ki ..."
Abstract

Cited by 138 (18 self)
 Add to MetaCart
Relationships between concepts account for a large proportion of semantic knowledge. We present a nonparametric Bayesian model that discovers systems of related concepts. Given data involving several sets of entities, our model discovers the kinds of entities in each set and the relations between kinds that are possible or likely. We apply our approach to four problems: clustering objects and features, learning ontologies, discovering kinship systems, and discovering structure in political data. Philosophers, psychologists and computer scientists have proposed that semantic knowledge is best understood as a system of relations. Two questions immediately arise: how can these systems be represented, and how are these representations acquired? Researchers who start with the
Learning from labeled and unlabeled data on a directed graph
 in: Proceedings of the 22nd International Conference on Machine Learning (ICML
"... We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed num ..."
Abstract

Cited by 103 (8 self)
 Add to MetaCart
We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to realworld web classification problems and obtained encouraging results. 1.
Overview of record linkage and current research directions
 BUREAU OF THE CENSUS
, 2006
"... This paper provides background on record linkage methods that can be used in combining data from a variety of sources such as person lists business lists. It also gives some areas of current research. ..."
Abstract

Cited by 85 (1 self)
 Add to MetaCart
This paper provides background on record linkage methods that can be used in combining data from a variety of sources such as person lists business lists. It also gives some areas of current research.
Networkbased marketing: Identifying likely adopters via consumer networks
 Statistical Science
"... Abstract. Networkbased marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on su ..."
Abstract

Cited by 68 (11 self)
 Add to MetaCart
Abstract. Networkbased marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on such marketing with an emphasis on the statistical methods used and the data to which these methods have been applied. We also provide a discussion of challenges and opportunities for this burgeoning research topic. Our survey highlights a gap in the literature. Because of inadequate data, prior studies have not been able to provide direct, statistical support for the hypothesis that network linkage can directly affect product/service adoption. Using a new data set that represents the adoption of a new telecommunications service, we show very strong support for the hypothesis. Specifically, we show three main results: (1) “Network neighbors”—those consumers linked to a prior customer—adopt the service at a rate 3–5 times greater than baseline groups selected by the best practices of the firm’s marketing team. In addition, analyzing the network allows the firm to acquire new customers who otherwise would have fallen through the cracks, because they would not have been identified based on traditional attributes. (2) Statistical models, built with a very large amount of geographic, demographic and prior purchase data, are significantly and substantially improved by including network information. (3) More detailed network information allows the ranking of the network neighbors so as to permit the selection of small sets of individuals with very high probabilities of adoption. Key words and phrases: Viral marketing, word of mouth, targeted marketing, network analysis, classification, statistical relational learning. 1.
Leveraging relational autocorrelation with latent group models
 In MRDM '05: Proceedings of the 4th international workshop on Multirelational mining. ACM
"... Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. ..."
Abstract

Cited by 58 (19 self)
 Add to MetaCart
Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. Recent research has explored the use of collective inference techniques to exploit this phenomenon. These techniques achieve significant performance gains by modeling observed correlations among class labels of related instances, but the models fail to capture a frequent cause of autocorrelation—the presence of underlying groups that influence the attributes on a set of entities. We propose a latent group model (LGM) for relational data, which discovers and exploits the hidden structures responsible for the observed autocorrelation among class labels. Modeling the latent group structure improves model performance, increases inference efficiency, and enhances our understanding of the datasets. We evaluate performance on three relational classification tasks and show that LGM outperforms models that ignore latent group structure when there is little known information with which to seed inference.
Link Mining: A Survey
 SigKDD Explorations Special Issue on Link Mining
, 2005
"... Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a singleobject type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly oth ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a singleobject type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information). Examples of homogeneous networks include single mode social networks, such as people connected by friendship links, or the WWW, a collection of linked web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a crossfertilization of ideas among these different communities. This is an exciting, rapidly expanding area. In this article, we review some of the common emerging themes. 1.
HigherOrder Web Link Analysis Using Multilinear Algebra
 IEEE INTERNATIONAL CONFERENCE ON DATA MINING
, 2005
"... Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular nonnegative matrix that captures the hyperlink structu ..."
Abstract

Cited by 45 (16 self)
 Add to MetaCart
Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular nonnegative matrix that captures the hyperlink structure of the web graph. We propose and test a new methodology that uses multilinear algebra to elicit more information from a higherorder representation of the hyperlink graph. We start by labeling the edges in our graph with the anchor text of the hyperlinks so that the associated linear algebra representation is a sparse, threeway tensor. The first two dimensions of the tensor represent the web pages while the third dimension adds the anchor text. We then use the rank1 factors of a multilinear PARAFAC tensor decomposition, which are akin to singular vectors of the SVD, to automatically identify topics in the collection along with the associated authoritative web pages.
Infinite hidden relational models
 In Proceedings of the 22nd International Conference on Uncertainity in Artificial Intelligence (UAI
, 2006
"... Relational learning analyzes the probabilistic constraints between the attributes of entities and relationships. We extend the expressiveness of relational models by introducing for each entity (or object) an infinitedimensional latent variable as part of a Dirichlet process (DP) mixture model. We d ..."
Abstract

Cited by 45 (17 self)
 Add to MetaCart
Relational learning analyzes the probabilistic constraints between the attributes of entities and relationships. We extend the expressiveness of relational models by introducing for each entity (or object) an infinitedimensional latent variable as part of a Dirichlet process (DP) mixture model. We discuss inference in the model, which is based on a DP Gibbs sampler, i.e., the Chinese restaurant process. We extend the Chinese restaurant process to be applicable to relational modeling. We discuss how information is propagated in the network of latent variables, reducing the necessity for extensive structural learning. In the context of a recommendation engine our approach realizes a principled solution for recommendations based on features of items, features of users and relational information. Our approach is evaluated in three applications: a recommendation system based on the MovieLens data set, the prediction of gene function using relational information and a medical recommendation system.
Blog: Relational modeling with unknown objects
 ICML 2004 Workshop on Statistical Relational Learning and Its Connections
, 2004
"... In many realworld probabilistic reasoning problems, one of the questions we want to answer is: how many objects are out there? Examples of such problems range from multitarget tracking to extracting information from text documents. However, most probabilistic modeling formalisms — even firstorder o ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
In many realworld probabilistic reasoning problems, one of the questions we want to answer is: how many objects are out there? Examples of such problems range from multitarget tracking to extracting information from text documents. However, most probabilistic modeling formalisms — even firstorder ones — assume a fixed, known set of objects. We introduce a language called Blog for specifying probability distributions over relational structures that include varying sets of objects. In this paper we present Blog informally, by means of example models for multitarget tracking and citation matching. We discuss some attractive features of Blog models and some avenues of future work. 1.
Using Ghost Edges for Classification in Sparsely Labeled Networks
"... We address the problem of classification in partially labeled networks (a.k.a. withinnetwork classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between clas ..."
Abstract

Cited by 30 (10 self)
 Add to MetaCart
We address the problem of classification in partially labeled networks (a.k.a. withinnetwork classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in realworld problems when observed labels are sparse. In this paper, we propose a novel approach to withinnetwork classification that combines aspects of statistical relational learning and semisupervised learning to improve classification performance in sparse networks. Our approach works by adding “ghost edges ” to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on realworld data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semisupervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.