Results 1 - 10
of
24
Probabilistic Models for Discovering E-Communities
, 2006
"... The increasing amount of communication between individuals in e-formats (e.g. email, Instant messaging and the Web) has motivated computational research in social network analysis (SNA). Previous work in SNA has emphasized the social network (SN) topology measured by communication frequencies while ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
The increasing amount of communication between individuals in e-formats (e.g. email, Instant messaging and the Web) has motivated computational research in social network analysis (SNA). Previous work in SNA has emphasized the social network (SN) topology measured by communication frequencies while ignoring the semantic information in SNs. In this paper, we propose two generative Bayesian models for semantic community discovery in SNs, combining probabilistic modeling with community detection in SNs. To simulate the generative models, an EnF-Gibbs sampling algorithm is proposed to address the efficiency and performance problems of traditional methods. Experimental studies on Enron email corpus show that our approach successfully detects the communities of individuals and in addition provides semantic topic descriptions of these communities.
Web Page Classification: Features and Algorithms
, 2007
"... Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but th ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As we review work in web page classification, we note the importance of these web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages. 1
Evaluating Similarity Measures for Emergent Semantics of Social Tagging
"... Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as navigation support, semantic search, and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures derived from established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity among tags and resources, considering different ways to aggregate annotations across users. After comparing how tag similarity measures predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory. We also investigate the issue of scalability. We find that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity.
Emerging semantic communities in peer web search
- In P2PIR ’06: Proceedings of the international workshop on Information retrieval in peer-to-peer networks
, 2006
"... Peer network systems are becoming an increasingly important development in Web search technology. Many studies show that peer search systems perform better when a query is sent to a group of peers semantically similar to the query. This suggests that semantic communities should form so that a query ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Peer network systems are becoming an increasingly important development in Web search technology. Many studies show that peer search systems perform better when a query is sent to a group of peers semantically similar to the query. This suggests that semantic communities should form so that a query can quickly propagate to many appropriate peers. For the network to be functional, its dynamic communication topology must match the semantic clustering of peers. We introduce two criteria to evaluate a peer search network based on the concept of semantic locality: first, the “smallworld” topology of the network; second, we use topical semantic similarity to monitor the quality of a peer’s neighbors over time by looking at whether a peer chooses semantically appropriate neighbors to route its queries. We present several simulation experiments conducted with different peer search algorithms on our peer Web search system, 6S. The results suggest that 6S, despite its use of an unstructured overlay network; can effectively foster the spontaneous formation of semantic communities through local peer interactions alone.
Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language
- In SIGMOD ’10: Proceedings of International Conference on Management of Data
, 2010
"... Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer si ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer size of schema information available to the user. We address this challenge by proposing a new query language that blends keyword search with structured query processing over large information graphs with rich semantics.
GiveALink: Mining a Semantic Network of Bookmarks for Web Search and Recommendation
- In Proc. KDD Workshop on Link Discovery: Issues, Approaches and Applications
, 2005
"... GiveALink is a public site where users donate their bookmarks to the Web community. Bookmarks are analyzed to build a new generation of Web mining techniques and new ways to search, recommend, surf, personalize and visualize the Web. We present a semantic similarity measure for URLs that takes advan ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
GiveALink is a public site where users donate their bookmarks to the Web community. Bookmarks are analyzed to build a new generation of Web mining techniques and new ways to search, recommend, surf, personalize and visualize the Web. We present a semantic similarity measure for URLs that takes advantage both of the hierarchical structure of the bookmark files of individual users, and of collaborative filtering across users. We analyze the social bookmark network induced by the similarity measure. A search and recommendation system is built from a number of ranking algorithms based on prestige, generality, and novelty measures extracted from the similarity data.
Bookmark hierarchies and collaborative recommendation
- In Proc. AAAI Conf., 2006. Forthcoming
, 2006
"... GiveALink.org is a social bookmarking site where users may donate and view their personal bookmark files online securely. The bookmarks are analyzed to build a new generation of intelligent information retrieval techniques to recommend, search, and personalize the Web. GiveALink does not use tags, c ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
GiveALink.org is a social bookmarking site where users may donate and view their personal bookmark files online securely. The bookmarks are analyzed to build a new generation of intelligent information retrieval techniques to recommend, search, and personalize the Web. GiveALink does not use tags, content, or links in the submitted Web pages. Instead we present a semantic similarity measure for URLs that takes advantage both of the hierarchical structure in the bookmark files of individual users, and of collaborative filtering across users. In addition, we build a recommendation and search engine from ranking algorithms based on popularity and novelty measures extracted from the similarity-induced network. Search results can be personalized using the bookmarks submitted by a user. We evaluate a subset of the proposed ranking measures by conducting a study with human subjects.
Automated Discovery and Analysis of Social Networks from Threaded Discussions. Paper presented at
- the International Network of Social Network Analysts, St. Pete Beach
, 2008
"... To gain greater insight into the operation of online social networks, we applied Natural Language Processing (NLP) techniques to text-based communication to identify and describe underlying social structures in online communities. This paper presents our approach and preliminary evaluation for conte ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
To gain greater insight into the operation of online social networks, we applied Natural Language Processing (NLP) techniques to text-based communication to identify and describe underlying social structures in online communities. This paper presents our approach and preliminary evaluation for content-based, automated discovery of social networks. Our research question is: What syntactic and semantic features of postings in a threaded discussions help uncover explicit and implicit ties between network members, and which provide a reliable estimate of the strengths of interpersonal ties among the network members? To evaluate our automated procedures, we compare the results from the NLP processes with social networks built from basic who-to-whom data, and a sample of hand-coded data derived from a close reading of the text. For our test case, and as part of ongoing research on networked learning, we used the archive of threaded discussions collected over eight iterations of an online graduate class. We first associate personal names and nicknames mentioned in the postings with class participants. Next we analyze the context in which each name occurs in the postings to determine whether or not there is an interpersonal tie between a sender of the posting and a person mentioned in it. Because information exchange is a key factor in the operation and success of a learning community, we estimate and assign weights to the ties by measuring the amount of information exchanged between each pair of the nodes; information in this case is operationalized as counts of important concept terms in the postings as derived through the NLP analysis. Finally, we compare the resulting network(s) against those derived from other means, including basic who-to-whom data derived from posting sequences (e.g., whose postings follow whose). In this comparison we evaluate what is gained in understanding network processes by our more elaborate analysis.
Efficient Assembly of Social Semantic Networks
"... Social bookmarks allow Web users to actively annotate individual Web resources. Researchers are exploring the use of these annotations to create implicit links between online resources. We define an implicit link as a relationship between two online resources established by the Web community. An ind ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Social bookmarks allow Web users to actively annotate individual Web resources. Researchers are exploring the use of these annotations to create implicit links between online resources. We define an implicit link as a relationship between two online resources established by the Web community. An individual may create or reinforce a relationship between two resources by applying a common tag or organizing them in a common folder. This has led to the exploration of techniques for building networks of resources, categories, and people using the social annotations. In order for these techniques to move from the lab to the real world, efficient building and maintenance of these potentially large networks remains a major obstacle. Methods for assembling and indexing these large networks will allow researchers to run more rigorous assessments of their proposed techniques. Toward this goal we explore an approach from the sparse matrix literature and apply it to our system, GiveALink.org. We also investigate distributing the assembly, allowing us to grow the network with the body of resources, annotations, and users. Dividing the network is effective for assembling a global network where the implicit links are dependent on global properties. Additionally, we explore alternative implicit link measures that remove global dependencies and thus allow for the global network to be assembled incrementally, as each participant makes independent contributions. Finally we evaluate three scalable similarity measures, two of which require a revision of the data model underlying our social annotations. ∗ Corresponding author.

