Results 1 - 10
of
53
Evaluating Similarity Measures for Emergent Semantics of Social Tagging
"... Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Social bookmarking systems and their emergent information structures, known as folksonomies, are increasingly important data sources for Semantic Web applications. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as navigation support, semantic search, and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures derived from established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity among tags and resources, considering different ways to aggregate annotations across users. After comparing how tag similarity measures predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory. We also investigate the issue of scalability. We find that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity.
Learning Social Tag Relevance by Neighbor Voting
"... Abstract—Social image analysis and retrieval is important for helping people organize and access the increasing amount of user-tagged multimedia. Since user tagging is known to be uncontrolled, ambiguous, and overly personalized, a fundamental problem is how to interpret the relevance of a user-cont ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract—Social image analysis and retrieval is important for helping people organize and access the increasing amount of user-tagged multimedia. Since user tagging is known to be uncontrolled, ambiguous, and overly personalized, a fundamental problem is how to interpret the relevance of a user-contributed tag with respect to the visual content the tag is describing. Intuitively, if different persons label visually similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose in this paper a neighbor voting algorithm which accurately and efficiently learns tag relevance by accumulating votes from visual neighbors. Under a set of well defined and realistic assumptions, we prove that our algorithm is a good tag relevance measurement for both image ranking and tag ranking. Three experiments on 3.5 million Flickr photos demonstrate the general applicability of our algorithm in both social image retrieval and image tag suggestion. Our tag relevance learning algorithm substantially improves upon baselines for all the experiments. The results suggest that the proposed algorithm is promising for real-world applications. Index Terms—Social tagging, tag relevance learning, neighbor voting, multimedia indexing and retrieval I.
Mining City Landmarks from Blogs by Graph Modeling 1
"... Recent years have witnessed great prosperity in communitycontributed multimedia. Discovering and summarizing knowledge from these data enables us to make better sense of the world. In this paper, we report our work on mining famous city landmarks from blogs for personalized tourist suggestions. Our ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Recent years have witnessed great prosperity in communitycontributed multimedia. Discovering and summarizing knowledge from these data enables us to make better sense of the world. In this paper, we report our work on mining famous city landmarks from blogs for personalized tourist suggestions. Our main contribution is a graph modeling framework to discover city landmarks by mining blog photo correlations with community supervision. This modeling fuses context, content, and community information in a style that simulates static (PageRank) and dynamic (HITS) ranking models to highlight representative data from the consensus of blog users. Preliminary, we identify geographical locations of page contents to harvest city sight photos from Web blogs, based on which we structure these photos into a Scene-View hierarchy * within each city. Our graph modeling consists of two phases: First, within a
Individual and social behavior in tagging systems
- In 20th ACM Conference on Hypertext and Hypermedia
, 2009
"... In tagging systems users can annotate items of interest with freeform terms. A good understanding of the usage characteristics of such systems is necessary to improve the design of current and next generation tagging systems. To this end, this work explores three aspects of user behavior in CiteULik ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In tagging systems users can annotate items of interest with freeform terms. A good understanding of the usage characteristics of such systems is necessary to improve the design of current and next generation tagging systems. To this end, this work explores three aspects of user behavior in CiteULike and Connotea, two systems that include tagging features to support online personalized management of scientific publications. First, this study characterizes the degree to which users re-tag previously published items and reuse tags: 10 to 20 % of the daily activity can be characterized as re-tagging and about 75 % of the activity as tag reuse. Second, we use the pairwise similarity between users ’ activity to characterize the interest sharing in these systems. We present the interest sharing distribution across the systems, show that this metric encodes information about existing usage patterns, and attempt to correlate interest sharing levels to indicators of collaboration such as co-membership in discussion groups and semantic similarity of tag vocabularies. Finally, we show that interest sharing leads to an implicit structure that exhibits a natural segmentation. Throughout the paper we discuss the potential impact of our findings on the design of mechanisms that support tagging systems.
Adapting K-Nearest Neighbor for Tag Recommendation in Folksonomies
"... Folksonomies, otherwise known as Collaborative Tagging Systems, enable Internet users to share, annotate and search for online resources with user selected labels called tags. Tag recommendation, the suggestion of an ordered set of tags during the annotation process, reduces the user effort from a k ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Folksonomies, otherwise known as Collaborative Tagging Systems, enable Internet users to share, annotate and search for online resources with user selected labels called tags. Tag recommendation, the suggestion of an ordered set of tags during the annotation process, reduces the user effort from a keyboard entry to a mouse click. By simplifying the annotation process tagging is promoted, noise in the data is reduced through the elimination of discrepancies that result in redundant tags, and ambiguous tags may be avoided. Tag recommenders can suggest tags that maximize utility, offer tags the user may not have previously considered or steer users toward adopting a core vocabulary. In sum, tag recommendation promotes a denser dataset that is useful in its own right or can be exploited by a myriad of data mining techniques for additional functionality. While there exists a long history of recommendation algorithms, the data structure of a Folksonomy is distinct from those found in traditional recommendation problems. We first explore two data reduction techniques, p-core processing and Hebbian deflation, then demonstrate how to adapt K-Nearest Neighbor for use with Folksonomies by incorporating user, resource and tag information into the algorithm. We further investigate multiple techniques for user modeling required to compute the similarity among users. Additionally we demonstrate that tag boosting, the promoting of tags previously applied by a user to a resource, improves the coverage and accuracy of K-Nearest Neighbor. These techniques are evaluated through extensive experimentation using data collected from two real Collaborative Tagging Web sites. Finally the modified K-Nearest Neighbor algorithm is compared with alternative techniques based on popularity and link analysis. We find that K-Nearest Neighbor modified for use with Folksonomies generates excellent recommendations, scales well with large datasets, and is applicable to both narrow and broadly focused Folksonomies.
Human-competitive tagging using automatic keyphrase extraction
"... This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a stat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper connects two research areas: automatic tagging on the web and statistical keyphrase extraction. First, we analyze the quality of tags in a collaboratively created folksonomy using traditional evaluation techniques. Next, we demonstrate how documents can be tagged automatically with a state-of-the-art keyphrase extraction algorithm, and further improve performance in this new domain using a new algorithm, “Maui”, that utilizes semantic information extracted from Wikipedia. Maui outperforms existing approaches and extracts tags that are competitive with those assigned by the best performing human taggers. 1
From Web 1.0 to Web 2.0 and Back – How did your Grandma Use to Tag? ∗
"... We consider the applicability of terms extracted from anchortext as a source of Web page descriptions in the form of tags. With a relatively simple and easy-to-use method, we show that anchortext significantly overlaps with tags obtained from the popular tagging portal del.icio.us. Considering the s ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We consider the applicability of terms extracted from anchortext as a source of Web page descriptions in the form of tags. With a relatively simple and easy-to-use method, we show that anchortext significantly overlaps with tags obtained from the popular tagging portal del.icio.us. Considering the size and diversity of the user community potentially involved in social tagging, this observation is rather surprising. Furthermore, we show by an evaluation using human-created relevance assessments the general suitability of the anchortext tag generation in terms of user-perceived precision values. The awareness of this easy-to-obtain source of tags could trigger the rise of new tagging portals pushed by this automatic bootstrapping process or be applied in already existing portals to increase the number of tags per page by merely looking at the anchortext which exists anyway.
Camera brand congruence in the flickr social graph
- In WSDM ’09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
, 2009
"... Given that my friends on Flickr use cameras of brand X, am I more likely to also use a camera of brand X? Given that one of these friends changes her brand, am I likely to do the same? These are the kind of questions addressed in this work. Direct applications involve personalized advertising in soc ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Given that my friends on Flickr use cameras of brand X, am I more likely to also use a camera of brand X? Given that one of these friends changes her brand, am I likely to do the same? These are the kind of questions addressed in this work. Direct applications involve personalized advertising in social networks. For our study we crawled a complete connected component of the Flickr friendship graph with a total of 67M edges and 3.9M users. Camera brands and models were assigned to users and time slots according to the model specific meta data pertaining to their images taken during these time slots. Similarly, we used, where provided in a user’s profile, information about a user’s geographic location and the groups joined on Flickr. Our main findings are the following. First, a pair of friends on Flickr has a significantly higher probability of being congruent, i.e., using the same brand, compared to two random users (27 % vs. 19%). Second, the degree of congruence goes up for pairs of friends (i) in the same country (29%), (ii) who both only have very few friends (30%), and (iii) with a very high cliqueness 1 (38%). Third, given that a user changes her camera model between March-May 2007 and March-May 2008, high cliqueness friends are more likely than random users to do the same (54 % vs. 48%). Fourth, users using high-end cameras 2 are far more loyal to their brand than users using point-and-shoot cameras, with a probability of staying with the same brand of 60 % vs 33%, given that a new camera is bought. Fifth, these “expert ” users ’ brand congruence reaches 66 % (!) for high cliqueness friends. To the best of our knowledge this is the first time that the phenomenon of brand congruence is studied for hundreds of thousands of users and over a period of two years. 1 The formal definition is given in Section 4.3.
Time-Sensitive Language Modelling for Online Term Recurrence Prediction
"... Abstract. We address the problem of online term recurrence prediction: for a stream of terms, at each time point predict what term is going to recur next in the stream given the term occurrence history so far. It has many applications, for example, in Web search and social tagging. In this paper, we ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We address the problem of online term recurrence prediction: for a stream of terms, at each time point predict what term is going to recur next in the stream given the term occurrence history so far. It has many applications, for example, in Web search and social tagging. In this paper, we propose a time-sensitive language modelling approach to this problem that effectively combines term frequency and term recency information, and describe how this approach can be implemented efficiently by an online learning algorithm. Our experiments on a real-world Web query log dataset show significant improvements over standard language modelling. 1
Compressing Tags to Find Interesting Media Groups ABSTRACT
"... On photo sharing websites like Flickr and Zooomr, users are offered the possibility to assign tags to their uploaded pictures. Using these tags to find interesting groups of semantically related pictures in the result set of a given query is a problem with obvious applications. We analyse this probl ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
On photo sharing websites like Flickr and Zooomr, users are offered the possibility to assign tags to their uploaded pictures. Using these tags to find interesting groups of semantically related pictures in the result set of a given query is a problem with obvious applications. We analyse this problem from a Minimum Description Length (MDL) perspective and develop an algorithm that finds the most interesting groups. The method is based on Krimp, which finds small sets of patterns that characterise the data using compression. These patterns are sets of tags, often assigned together to photos. The better a database compresses, the more structure it contains and thus the more homogeneous it is. Following this observation we devise a compression-based measure. Our experiments on Flickr data show that the most interesting and homogeneous groups are found. We show extensive examples and compare to clusterings on the Flickr website.

