Results 1 - 10
of
10
Why Collective Inference Improves Relational Classification
- In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial tr ..."
Abstract
-
Cited by 79 (18 self)
- Add to MetaCart
Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial transactions. Several recent studies indicate that collective inference can significantly reduce classification error when compared with traditional inference techniques. We investigate the underlying mechanisms for this error reduction by reviewing past work on collective inference and characterizing different types of statistical models used for making inference in relational data. We show important differences among these models, and we characterize the necessary and sufficient conditions for reduced classification error based on experiments with real and simulated data.
Relational dependency networks
- Journal of Machine Learning Research
, 2007
"... Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most re ..."
Abstract
-
Cited by 39 (11 self)
- Add to MetaCart
Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational datasets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of real-world datasets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions.
Data Mining in Social Networks
- In National Academy of Sciences Symposium on Dynamic Social Network Modeling and Analysis
, 2002
"... Abstract. Several techniques for learning statistical models have been developed recently by researchers in machine learning and data mining. All of these techniques must address a similar set of representational and algorithmic choices and must face a set of statistical challenges unique to learnin ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Abstract. Several techniques for learning statistical models have been developed recently by researchers in machine learning and data mining. All of these techniques must address a similar set of representational and algorithmic choices and must face a set of statistical challenges unique to learning from relational data.
STATISTICAL MODELS AND ANALYSIS TECHNIQUES FOR LEARNING IN RELATIONAL DATA
, 2006
"... Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and
thereby decision-making, if machine learning techniques can effectively exploit the relational information.
This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason
about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable.
We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this
framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In
particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across
different types of relational data.
Algorithms for Storytelling
- In Proc. KDD’06
, 2006
"... We formulate a new data mining problem called storytelling as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expre ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
We formulate a new data mining problem called storytelling as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CARTwheels redescription mining algorithm in an A * search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large datasets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between genesets in a bioinformatics dataset, and relating publications in the PubMed index of abstracts.
NetIntel: A Database for Manipulation of Rich Social Network Data
, 2005
"... There is a pressing need to automatically collect data on social systems as rich network data, analyze such systems to find hidden relations and groups, prune the datasets to locate regions of interest, locate key actors, characterize the structure, locate points of vulnerability, and simulate chang ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
There is a pressing need to automatically collect data on social systems as rich network data, analyze such systems to find hidden relations and groups, prune the datasets to locate regions of interest, locate key actors, characterize the structure, locate points of vulnerability, and simulate change in a system as it evolves naturally or in response to strategic interventions over time or under certain impacts, including modification of data. To meet this challenge, we need to develop a new mechanism for storing and manipulating social structure data. Social structure data will be stored in a relational database capable of manipulating large quantities of data. The database is structured to preserve the character and integrity of the data in an extensible manner, and is extended with a number of functions specifically designed for manipulating graph-based
COLAB: A Laboratory Environment for Studying Analyst Sensemaking and Collaboration
, 2005
"... COLAB is a laboratory for studying tools that facilitate collaboration and sensemaking among groups of human analysts as they build interpretations of unfolding situations based on accruing intelligence data. The laboratory has three components. The Hats Simulator provides a challenging problem ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
COLAB is a laboratory for studying tools that facilitate collaboration and sensemaking among groups of human analysts as they build interpretations of unfolding situations based on accruing intelligence data. The laboratory has three components. The Hats Simulator provides a challenging problem domain involving thousands to millions of agents engaged in individual and collective behaviors, a small portion of which are terrorist.
Autocorrelation and relational learning: Challenges and opportunities
- ICML Statistical Relational Learning Workshop
, 2004
"... Autocorrelation, a common characteristic of many datasets, refers to correlation between values of the same variable on related objects. It violates the critical assumption of instance independence that underlies most conventional models. In this paper, we provide an overview of research on autocorr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Autocorrelation, a common characteristic of many datasets, refers to correlation between values of the same variable on related objects. It violates the critical assumption of instance independence that underlies most conventional models. In this paper, we provide an overview of research on autocorrelation in a number of fields with an emphasis on implications for relational learning, and outline a number of challenges and opportunities for model learning and inference. 1.
Social Structure Simulation and Inference using Artificial Intelligence Techniques
, 2005
"... Foundation, or the U.S. government. The study of complex social and technological systems, such as organizations, requires a sophisticated approach that accounts for the underlying psychological and sociological principles, communication patterns and the technologies within these systems. Social Net ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Foundation, or the U.S. government. The study of complex social and technological systems, such as organizations, requires a sophisticated approach that accounts for the underlying psychological and sociological principles, communication patterns and the technologies within these systems. Social Network Analysis and link analysis have since inception operated on the cutting edge bringing together mathematical analysis of social structures and qualitative reasoning and interpretation. As available computing power grew, social network-based models have become not only an analysis tool, but also a methodology for building new theories of social behaviour and organizational evolution, frequently through the creation of simulation models. This work examines the past approaches of creating Social Network-based semantically consistent and interpretable models of social structure and social
The Hats Information Fusion Challenge Problem Abstract- We describe the Hats Simulator as an
"... information fusion challenge problem. Hats is a virtual world in which many agents engage in individual and collective activities. Most agents are benign, some intend harm. Agent activities are planned by a generative planner. Playing against the simulator, the goal of the analyst is to identify and ..."
Abstract
- Add to MetaCart
information fusion challenge problem. Hats is a virtual world in which many agents engage in individual and collective activities. Most agents are benign, some intend harm. Agent activities are planned by a generative planner. Playing against the simulator, the goal of the analyst is to identify and arrest harmful agents before they carry out their plans. The simulator provides both scalar and categorical information. Information fusion tasks in the Hats domain include assessing information value, choosing information collection strategies, tracking individuals and resources, identifying events, hypothesizing group membership, ascribing suspicion, and identifying plans. After each game, the analyst is assessed a set of scores including the cost of acquiring information, the cost of falsely accusing benign agents, and the cost of failing to detect harmful agents. The simulator is implemented and currently manages hundreds of thousands of agents.

