Results 1  10
of
89
Knowledge sharing and Yahoo Answers: Everyone knows something
 Proceedings of WWW'08
, 2008
"... Yahoo Answers (YA) is a large and diverse questionanswer forum, acting not only as a medium for sharing technical knowledge, but as a place where one can seek advice, gather opinions, and satisfy one’s curiosity about a countless number of things. In this paper, we seek to understand YA’s knowledge ..."
Abstract

Cited by 171 (4 self)
 Add to MetaCart
(Show Context)
Yahoo Answers (YA) is a large and diverse questionanswer forum, acting not only as a medium for sharing technical knowledge, but as a place where one can seek advice, gather opinions, and satisfy one’s curiosity about a countless number of things. In this paper, we seek to understand YA’s knowledge sharing activity. We analyze the forum categories and cluster them according to content characteristics and patterns of interaction among the users. While interactions in some categories resemble expertise sharing forums, others incorporate discussion, everyday advice, and support. With such a diversity of categories in which one can participate, we find that some users focus narrowly on specific topics, while others participate across categories. This not only allows us to map related categories, but to characterize the entropy of the users ’ interests. We find that lower entropy correlates with receiving higher answer ratings, but only for categories where factual expertise is primarily sought after. We combine both user attributes and answer characteristics
Analyzing (social media) networks with NodeXL
"... We present NodeXL, an extendible toolkit for network overview, discovery and exploration implemented as an addin to the Microsoft Excel 2007 spreadsheet software. We demonstrate NodeXL data analysis and visualization features with a social media data sample drawn from an enterprise intranet social ..."
Abstract

Cited by 114 (24 self)
 Add to MetaCart
(Show Context)
We present NodeXL, an extendible toolkit for network overview, discovery and exploration implemented as an addin to the Microsoft Excel 2007 spreadsheet software. We demonstrate NodeXL data analysis and visualization features with a social media data sample drawn from an enterprise intranet social network. A sequence of NodeXL operations from data import to computation of network statistics and refinement of network visualization through sorting, filtering, and clustering functions is described. These operations reveal sociologically relevant differences in the patterns of interconnection among employee participants in the social media space. The tool and method can be broadly applied.
Efficient semistreaming algorithms for local triangle counting in massive graphs
 in KDD’08, 2008
"... In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node v ∈ V in the graph. The problem of computing the global number of triangles in a graph ha ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
(Show Context)
In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node v ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in largescale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of minwise independent permutations (Broder et al. 1998). Our algorithms operate in a semistreaming fashion, using O(V ) space in main memory and performing O(log V ) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(E) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach. Luca Becchetti was partially supported by EU Integrated
Facts or friends?: distinguishing informational and conversational questions in social Q&A sites
 In CHI
, 2009
"... Tens of thousands of questions are asked and answered every day on social question and answer (Q&A) Web sites such as Yahoo Answers. While these sites generate an enormous volume of searchable data, the problem of determining which questions and answers are archival quality has grown. One major ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
(Show Context)
Tens of thousands of questions are asked and answered every day on social question and answer (Q&A) Web sites such as Yahoo Answers. While these sites generate an enormous volume of searchable data, the problem of determining which questions and answers are archival quality has grown. One major component of this problem is the prevalence of conversational questions, identified both by Q&A sites and academic literature as questions that are intended simply to start discussion. For example, a conversational question such as “do you believe in evolution? ” might successfully engage users in discussion, but probably will not yield a useful web page for users searching for information about evolution. Using data from three popular Q&A sites, we confirm that humans can reliably distinguish between these conversational questions and other informational questions, and present evidence that conversational questions typically have much lower potential archival value than informational questions. Further, we explore the use of machine learning techniques to automatically classify questions as conversational or informational, learning in the process about categorical, linguistic, and social differences between different question types. Our algorithms approach human performance, attaining 89.7 % classification accuracy in our experiments. Author Keywords Q&A, online community, machine learning.
How Opinions are Received by Online Communities: A Case Study on Amazon.com Helpfulness Votes
"... There are many online settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is Amazon.com, where reviews come with annotations like “26 of 32 people found the following review helpful. ” Opinion evaluati ..."
Abstract

Cited by 55 (6 self)
 Add to MetaCart
(Show Context)
There are many online settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is Amazon.com, where reviews come with annotations like “26 of 32 people found the following review helpful. ” Opinion evaluation appears in many offline settings as well, including market research and political campaigns. Reasoning about the evaluation of an opinion is fundamentally different from reasoning about the opinion itself: rather than asking, “What did Y think of X?”, we are asking, “What did Z think of Y’s opinion of X? ” Here we develop a framework for analyzing and modeling opinion evaluation, using a largescale collection of Amazon book reviews as a dataset. We find that the perceived helpfulness of a review depends not just on its content but also but also in subtle ways on how the expressed evaluation relates to other evaluations of the same product. As part of our approach, we develop novel methods that take advantage of the phenomenon of review “plagiarism ” to control for the effects of text in opinion evaluation, and we provide a simple and natural mathematical model consistent with our findings. Our analysis also allows us to distinguish among the predictions of competing theories from sociology and social psychology, and to discover unexpected differences in the collective opinionevaluation behavior of user populations from different countries.
Questions in, knowledge in?: a study of naver’s question answering community
 In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09
, 2009
"... Large generalpurposed community questionanswering sites are becoming popular as a new venue for generating knowledge and helping users in their information needs. In this paper we analyze the characteristics of knowledge generation and user participation behavior in the largest questionanswering ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
(Show Context)
Large generalpurposed community questionanswering sites are becoming popular as a new venue for generating knowledge and helping users in their information needs. In this paper we analyze the characteristics of knowledge generation and user participation behavior in the largest questionanswering online community in South Korea, Naver Knowledge–iN. We collected and analyzed over 2.6 million question/answer pairs from fifteen categories between 2002 and 2007, and have interviewed twenty six users to gain insights into their motivations, roles, usage and expertise. We find altruism, learning, and competency are frequent motivations for top answerers to participate, but that participation is often highly intermittent. Using a simple measure of user performance, we find that higher levels of participation correlate with better performance. We also observe that users are motivated in part through a point system to build a comprehensive knowledge database. These and other insights have significant implications for future knowledge generating online communities.
ManyNets: An interface for multiple network analysis and visualization
 In Proceedings of the 2008 Conference on Human Factors in Computing Systems (CHI
"... Traditional network analysis tools support analysts in studying a single network. ManyNets offers these analysts a powerful new approach that enables them to work on multiple networks simultaneously. Several thousand networks can be presented as rows in a tabular visualization, and then inspected ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
(Show Context)
Traditional network analysis tools support analysts in studying a single network. ManyNets offers these analysts a powerful new approach that enables them to work on multiple networks simultaneously. Several thousand networks can be presented as rows in a tabular visualization, and then inspected, sorted and filtered according to their attributes. The networks to be displayed can be obtained by subdivision of larger networks. Examples of meaningful subdivisions used by analysts include ego networks, community extraction, and timebased slices. Cell visualizations and interactive column overviews allow analysts to assess the distribution of attributes within particular sets of networks. Details, such as traditional nodelink diagrams, are available on demand. We describe a case study analyzing a social network geared towards film recommendations by means of decomposition. A small usability study provides feedback on the use of the interface on a set of tasks issued from the case study.
Making sense of strangers' expertise from signals in digital artifacts
 Proc. CHI '09
, 2009
"... Contemporary work increasingly involves interacting with strangers in technologymediated environments. In this context, we come to rely on digital artifacts to infer characteristics of other people. This paper reports the results of a study conducted in a global company that used expertise search a ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
Contemporary work increasingly involves interacting with strangers in technologymediated environments. In this context, we come to rely on digital artifacts to infer characteristics of other people. This paper reports the results of a study conducted in a global company that used expertise search as a vehicle for exploring how people interpret a range of information available in online profiles in evaluating whom to interact with for expertise. Using signaling theory as a conceptual framework, we describe how certain ‘signals ’ in various social software are hard to fake, and are thus more reliable indicators of expertise. Multilevel regression analysis revealed that participation in social software, social connection information, and selfdescribed expertise in the corporate directory were significantly helpful in the decision to contact someone for expertise. Qualitative analysis provided further insights regarding the interpretations people form of others’ expertise from digital artifacts. We conclude with suggestions on differentiating various types of information available within online profiles and implications for the design of expertise locator/recommender systems. Author Keywords Signaling, expertise search, social software, social networks
Efficient Triangle Counting in Large Graphs via Degreebased Vertex Partitioning
"... The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering t ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering the hidden thematic structures in the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting approximation algorithm which can be adapted to the semistreaming model [23]. The key idea of our algorithm is to combine the sampling algorithm of [51,52] and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in [5], treating each set appropriately. From a mathematical perspective, we show a simplified proof of [52] which uses the powerful KimVu concentration inequality [31] based on the HajnalSzemerédi theorem [25]. Furthermore, we improve bounds of existing triple sampling ( techniques based on a theorem of Ahlswede and Katona [3]. We obtain a running time O m + m3/2 log n tɛ2) and an (1 ± ɛ)
Triangle sparsifiers
 Journal of Graph Algorithms and Applications
"... In this work, we introduce the notion of triangle sparsifiers, i.e., sparse graphs which are approximately the same to the original graph with respect to the triangle count. This results in a practical triangle counting method with strong theoretical guarantees. For instance, for unweighted graphs w ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
In this work, we introduce the notion of triangle sparsifiers, i.e., sparse graphs which are approximately the same to the original graph with respect to the triangle count. This results in a practical triangle counting method with strong theoretical guarantees. For instance, for unweighted graphs we show a randomized algorithm for approximately counting the number of triangles in a graph G, which proceeds as follows: keep each edge independently with probability p, enumerate the triangles in the sparsified graph G ′ and return the number of triangles found in G ′ multiplied by p −3. We prove that under mild assumptions on G and p our algorithm returns a good approximation for the number of triangles with high probability. Specifically, we show that if p ≥ max ( polylog(n)∆ t polylog(n) t1/3), where n, t, ∆, and T denote the number of vertices in G, the number of triangles in G, the maximum number of triangles an edge of G is contained and our triangle count estimate respectively, then T is strongly concentrated around t: Pr [T − t  ≥ ɛt] ≤ n −K. We illustrate the efficiency of our algorithm on various large realworld datasets where we obtain significant speedups. Finally, we investigate cut and spectral sparsifiers with respect to triangle counting and show that they are not optimal. Submitted: