Results 1 -
8 of
8
Graph regularized transductive classification on heterogeneous information networks
- In ECML PKDD
, 2010
"... Abstract. A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning fr ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning from such labeled and unlabeled data via transductive classification can lead to good knowledge extraction of the hidden network structure. However, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the transductive classification problem on heterogeneous networked data which share a common topic. Only some objects in the given network are labeled, and we aim to predict labels for all types of the remaining objects. A novel graph-based regularization framework, GNetMine, is proposed to model the link structure in information networks with arbitrary network schema and arbitrary number of object/link types. Specifically, we explicitly respect the type differences by preserving consistency over each relation graph corresponding to each type of links separately. Efficient computational schemes are then introduced to solve the corresponding optimization problem. Experiments on the DBLP data set show that our algorithm significantly improves the classification accuracy over existing state-of-the-art methods. 1
Mining Knowledge from Databases: An Information Network Analysis Approach ∗
"... Most people consider a database is merely a data repository that supports data storage and retrieval. Actually, a database contains rich, inter-related, multi-typed data and information, forming one or a set of gigantic, interconnected, heterogeneous information networks. Much knowledge can be deriv ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Most people consider a database is merely a data repository that supports data storage and retrieval. Actually, a database contains rich, inter-related, multi-typed data and information, forming one or a set of gigantic, interconnected, heterogeneous information networks. Much knowledge can be derived from such information networks if we systematically develop an effective and scalable database-oriented information network analysis technology. In this tutorial, we introduce database-oriented information network analysis methods and demonstrate how information networks can be used to improve data quality and consistency, facilitate data integration, and generate interesting knowledge. This tutorial presents an organized picture on how to turn a database into one or a set of organized heterogeneous information networks, how information networks can be used for data cleaning, data consolidation, and data qualify improvement, how to discover various kinds of knowledge from information networks, how to perform OLAP in information networks, and how to transform database data into knowledge by information network analysis. Moreover, we present interesting case studies on real datasets, including DBLP and Flickr, and show how interesting and organized knowledge can be generated from database-oriented information networks.
Survey on Social Tagging Techniques
"... Social tagging on online portals has become a trend now. It has emerged as one of the best ways of associating metadata with web objects. With the increase in the kinds of web objects becoming available, collaborative tagging of such objects is also developing along new dimensions. This popularity h ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Social tagging on online portals has become a trend now. It has emerged as one of the best ways of associating metadata with web objects. With the increase in the kinds of web objects becoming available, collaborative tagging of such objects is also developing along new dimensions. This popularity has led to a vast literature on social tagging. In this survey paper, we would like to summarize different techniques employed to study various aspects of tagging. Broadly, we would discuss about properties of tag streams, tagging models, tag semantics, generating recommendations using tags, visualizations of tags, applications of tags and problems associated with tagging usage. We would discuss topics like why people tag, what influences the choice of tags, how to model the tagging process, kinds of tags, different power laws observed in tagging domain, how tags are created, how to choose the right tags for recommendation, etc. We conclude with thoughts on future work in the area.
Learning to Combine Discriminative Classifiers – Confidence based
"... Much of research in data mining and machine learning has led to numerous practical applications. Spam filtering, fraud detection, and user query-intent analysis has relied heavily on machine learned classifiers, and resulted in improvements in robust classification accuracy. Combining multiple class ..."
Abstract
- Add to MetaCart
Much of research in data mining and machine learning has led to numerous practical applications. Spam filtering, fraud detection, and user query-intent analysis has relied heavily on machine learned classifiers, and resulted in improvements in robust classification accuracy. Combining multiple classifiers (a.k.a. Ensemble Learning) is a well studied and has been known to improve effectiveness of a classifier. To address two key challenges in Ensemble Learning – (1) learning weights of individual classifiers and (2) the combination rule of their weighted responses, this paper proposes a novel Ensemble classifier, EnLR, that computes weights of responses from discriminative classifiers and combines their weighted responses to produce a single response for a test instance. The combination rule is based on aggregating weighted responses,
Learning Search Tasks in Queries and Web Pages via
"... As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. Similarly, many web pages are specifically ..."
Abstract
- Add to MetaCart
As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that a query is often triggered by a search task that the user wants to accomplish. Similarly, many web pages are specifically designed to help accomplish a certain task. Therefore, learning hidden tasks behind queries and web pages can help search engines return the most useful web pages to users by task matching. For instance, the search task that triggers query “thinkpad T410 broken ” is to maintain a computer, and it is desirable for a search engine to return the Lenovo troubleshooting page on the top of the list. However, existing search engine technologies mainly focus on topic detection or relevance ranking, which are not able to predict the task that triggers a query and the task a web page can accomplish.
Graph-based Classification on Heterogeneous Information Networks
"... Abstract—A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for part of the objects. Learn ..."
Abstract
- Add to MetaCart
Abstract—A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for part of the objects. Learning from such labeled and unlabeled data via transductive classification can lead to good knowledge extraction of the hidden network structure. However, although classification on homogeneous networks has been studied over decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the transductive classification problem on heterogeneous networked data which share a common topic. Only part of the objects in the given network are labeled, and we aim to predict labels for all types of the remaining objects. A novel graph-based regularization framework, GNetClass, is proposed to model the link structure in information networks with arbitrary network schema and number of object/link types. Specifically, we explicitly respect the type differences by preserving consistency over each relation graph corresponding to each type of links separately. Efficient computational schemes are then introduced to solve the corresponding optimization problem. Experiments on the DBLP data set show that our algorithm significantly improves the classification accuracy over existing state-of-the-art methods.
INFORMATION NETWORKS BY
"... A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning from such la ..."
Abstract
- Add to MetaCart
A heterogeneous information network is a network composed of multiple types of objects and links. Recently, it has been recognized that strongly-typed heterogeneous information networks are prevalent in the real world. Sometimes, label information is available for some objects. Learning from such labeled and unlabeled data via transductive classification can lead to good knowledge extraction of the hidden network structure. However, although classification on homogeneous networks has been studied for decades, classification on heterogeneous networks has not been explored until recently. In this paper, we consider the transductive classification problem on heterogeneous networked data which share a common topic. Only some objects in the given network are labeled, and we aim to predict labels for all types of the remaining objects. A novel graph-based regularization framework, GNetMine, is proposed to model the link structure in information networks with arbitrary network schema and arbitrary number of object/link types. Specifically, we explicitly respect the type differences by preserving consistency over each relation graph corresponding to each type of links separately. Efficient computational schemes are then introduced to solve the corresponding optimization problem. Experiments on the DBLP data set show that our algorithm significantly improves the classification accuracy over existing state-of-theart methods. ii To my family for all their love. iii Acknowledgments First and foremost, I deeply thank my advisor Prof. Jiawei Han whose guidance and support helped me during the whole procedure of writing this thesis. Moreover, I would like to express my gratitude to all professors and colleagues in my research group
UNED at MediaEval 2011: Can Delicious help us to improve automatic video tagging?
"... In this paper we present the second participation of the NLP&IR group at UNED in the MediaEval Genre Tagging Task. This categorization task was carried out applying an Information Retrieval (IR) approach considering the video collection’s textual data and query expansion techniques. The results show ..."
Abstract
- Add to MetaCart
In this paper we present the second participation of the NLP&IR group at UNED in the MediaEval Genre Tagging Task. This categorization task was carried out applying an Information Retrieval (IR) approach considering the video collection’s textual data and query expansion techniques. The results show that the combination of social tags and language models is useful to perform query expansion.

