Results 1 - 10
of
68
Learning systems of concepts with an infinite relational model
- In Proceedings of the 21st National Conference on Artificial Intelligence
, 2006
"... Relationships between concepts account for a large proportion of semantic knowledge. We present a nonparametric Bayesian model that discovers systems of related concepts. Given data involving several sets of entities, our model discovers the kinds of entities in each set and the relations between ki ..."
Abstract
-
Cited by 86 (14 self)
- Add to MetaCart
Relationships between concepts account for a large proportion of semantic knowledge. We present a nonparametric Bayesian model that discovers systems of related concepts. Given data involving several sets of entities, our model discovers the kinds of entities in each set and the relations between kinds that are possible or likely. We apply our approach to four problems: clustering objects and features, learning ontologies, discovering kinship systems, and discovering structure in political data. Philosophers, psychologists and computer scientists have proposed that semantic knowledge is best understood as a system of relations. Two questions immediately arise: how can these systems be represented, and how are these representations acquired? Researchers who start with the
Learning from labeled and unlabeled data on a directed graph
- in: Proceedings of the 22nd International Conference on Machine Learning (ICML
"... We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed num ..."
Abstract
-
Cited by 75 (8 self)
- Add to MetaCart
We propose a general framework for learning from labeled and unlabeled data on a directed graph in which the structure of the graph including the directionality of the edges is considered. The time complexity of the algorithm derived from this framework is nearly linear due to recently developed numerical techniques. In the absence of labeled instances, this framework can be utilized as a spectral clustering method for directed graphs, which generalizes the spectral clustering approach for undirected graphs. We have applied our framework to real-world web classification problems and obtained encouraging results. 1.
Overview of record linkage and current research directions
- BUREAU OF THE CENSUS
, 2006
"... This paper provides background on record linkage methods that can be used in combining data from a variety of sources such as person lists business lists. It also gives some areas of current research. ..."
Abstract
-
Cited by 55 (1 self)
- Add to MetaCart
This paper provides background on record linkage methods that can be used in combining data from a variety of sources such as person lists business lists. It also gives some areas of current research.
Network-based marketing: Identifying likely adopters via consumer networks
- Statistical Science
"... Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on su ..."
Abstract
-
Cited by 48 (10 self)
- Add to MetaCart
Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on such marketing with an emphasis on the statistical methods used and the data to which these methods have been applied. We also provide a discussion of challenges and opportunities for this burgeoning research topic. Our survey highlights a gap in the literature. Because of inadequate data, prior studies have not been able to provide direct, statistical support for the hypothesis that network linkage can directly affect product/service adoption. Using a new data set that represents the adoption of a new telecommunications service, we show very strong support for the hypothesis. Specifically, we show three main results: (1) “Network neighbors”—those consumers linked to a prior customer—adopt the service at a rate 3–5 times greater than baseline groups selected by the best practices of the firm’s marketing team. In addition, analyzing the network allows the firm to acquire new customers who otherwise would have fallen through the cracks, because they would not have been identified based on traditional attributes. (2) Statistical models, built with a very large amount of geographic, demographic and prior purchase data, are significantly and substantially improved by including network information. (3) More detailed network information allows the ranking of the network neighbors so as to permit the selection of small sets of individuals with very high probabilities of adoption. Key words and phrases: Viral marketing, word of mouth, targeted marketing, network analysis, classification, statistical relational learning. 1.
Leveraging relational autocorrelation with latent group models
- In MRDM '05: Proceedings of the 4th international workshop on Multi-relational mining. ACM
"... Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. ..."
Abstract
-
Cited by 43 (14 self)
- Add to MetaCart
Abstract. The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. Recent research has explored the use of collective inference techniques to exploit this phenomenon. These techniques achieve significant performance gains by modeling observed correlations among class labels of related instances, but the models fail to capture a frequent cause of autocorrelation—the presence of underlying groups that influence the attributes on a set of entities. We propose a latent group model (LGM) for relational data, which discovers and exploits the hidden structures responsible for the observed autocorrelation among class labels. Modeling the latent group structure improves model performance, increases inference efficiency, and enhances our understanding of the datasets. We evaluate performance on three relational classification tasks and show that LGM outperforms models that ignore latent group structure when there is little known information with which to seed inference.
Higher-Order Web Link Analysis Using Multilinear Algebra
- IEEE INTERNATIONAL CONFERENCE ON DATA MINING
, 2005
"... Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular non-negative matrix that captures the hyperlink structu ..."
Abstract
-
Cited by 37 (16 self)
- Add to MetaCart
Linear algebra is a powerful and proven tool in web search. Techniques, such as the PageRank algorithm of Brin and Page and the HITS algorithm of Kleinberg, score web pages based on the principal eigenvector (or singular vector) of a particular non-negative matrix that captures the hyperlink structure of the web graph. We propose and test a new methodology that uses multilinear algebra to elicit more information from a higher-order representation of the hyperlink graph. We start by labeling the edges in our graph with the anchor text of the hyperlinks so that the associated linear algebra representation is a sparse, three-way tensor. The first two dimensions of the tensor represent the web pages while the third dimension adds the anchor text. We then use the rank-1 factors of a multilinear PARAFAC tensor decomposition, which are akin to singular vectors of the SVD, to automatically identify topics in the collection along with the associated authoritative web pages.
Link Mining: A Survey
- SigKDD Explorations Special Issue on Link Mining
, 2005
"... Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly oth ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type, or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information). Examples of homogeneous networks include single mode social networks, such as people connected by friendship links, or the WWW, a collection of linked web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. While network analysis has been studied in depth in particular areas such as social network analysis, hypertext mining, and web analysis, only recently has there been a cross-fertilization of ideas among these different communities. This is an exciting, rapidly expanding area. In this article, we review some of the common emerging themes. 1.
Infinite hidden relational models
- In Proceedings of the 22nd International Conference on Uncertainity in Artificial Intelligence (UAI
, 2006
"... Relational learning analyzes the probabilistic constraints between the attributes of entities and relationships. We extend the expressiveness of relational models by introducing for each entity (or object) an infinitedimensional latent variable as part of a Dirichlet process (DP) mixture model. We d ..."
Abstract
-
Cited by 28 (14 self)
- Add to MetaCart
Relational learning analyzes the probabilistic constraints between the attributes of entities and relationships. We extend the expressiveness of relational models by introducing for each entity (or object) an infinitedimensional latent variable as part of a Dirichlet process (DP) mixture model. We discuss inference in the model, which is based on a DP Gibbs sampler, i.e., the Chinese restaurant process. We extend the Chinese restaurant process to be applicable to relational modeling. We discuss how information is propagated in the network of latent variables, reducing the necessity for extensive structural learning. In the context of a recommendation engine our approach realizes a principled solution for recommendations based on features of items, features of users and relational information. Our approach is evaluated in three applications: a recommendation system based on the Movie-Lens data set, the prediction of gene function using relational information and a medical recommendation system.
Blog: Relational modeling with unknown objects
- ICML 2004 Workshop on Statistical Relational Learning and Its Connections
, 2004
"... In many real-world probabilistic reasoning problems, one of the questions we want to answer is: how many objects are out there? Examples of such problems range from multitarget tracking to extracting information from text documents. However, most probabilistic modeling formalisms — even firstorder o ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
In many real-world probabilistic reasoning problems, one of the questions we want to answer is: how many objects are out there? Examples of such problems range from multitarget tracking to extracting information from text documents. However, most probabilistic modeling formalisms — even firstorder ones — assume a fixed, known set of objects. We introduce a language called Blog for specifying probability distributions over relational structures that include varying sets of objects. In this paper we present Blog informally, by means of example models for multi-target tracking and citation matching. We discuss some attractive features of Blog models and some avenues of future work. 1.
Logical Bayesian Networks and their relation to other probabilistic logical models
- In Proceedings of 15th International Conference on Inductive Logic Pogramming (ILP-05), volume 3625 of Lecture Notes in Artificial Intelligence
, 2005
"... We review Logical Bayesian Networks, a language for probabilistic logical modelling, and discuss its relation to Probabilistic Relational Models and Bayesian Logic Programs. 1 Probabilistic Logical Models Probabilistic logical models are models combining aspects of probability theory with aspects of ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
We review Logical Bayesian Networks, a language for probabilistic logical modelling, and discuss its relation to Probabilistic Relational Models and Bayesian Logic Programs. 1 Probabilistic Logical Models Probabilistic logical models are models combining aspects of probability theory with aspects of Logic Programming, first-order logic or relational languages. Recently a variety of languages to describe such models has been introduced. For some languages techniques exist to learn such models from data. Two examples are Probabilistic Relational Models (PRMs) [4] and Bayesian Logic Programs (BLPs) [5]. These two languages are probably the most popular and well-known in the Relational Data Mining community. We introduce a new language, Logical Bayesian Networks (LBNs) [2], that is strongly related to PRMs and BLPs yet solves some of their problems with respect to knowledge representation (related to expressiveness and intuitiveness). PRMs, BLPs and LBNs all follow the principle of Knowledge Based Model Construction: they offer a language that can be used to specify general probabilistic logical knowledge and they provide a methodology to construct a propositional model based on this knowledge when given a specific

