Results 1 - 10
of
16
Eddi: Interactive Topic-based Browsing of Social Status Streams
"... Twitter streams are on overload: active users receive hundreds of items per day, and existing interfaces force us to march through a chronologically-ordered morass to find tweets of interest. We present an approach to organizing a user's own feed into coherently clustered trending topics for more di ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Twitter streams are on overload: active users receive hundreds of items per day, and existing interfaces force us to march through a chronologically-ordered morass to find tweets of interest. We present an approach to organizing a user's own feed into coherently clustered trending topics for more directed exploration. Our Twitter client, called Eddi, groups tweets in a user’s feed into topics mentioned explicitly or implicitly, which users can then browse for items of interest. To implement this topic clustering, we have developed a novel algorithm for discovering topics in short status updates powered by linguistic syntactic transformation and callouts to a search engine. An algorithm evaluation reveals that search engine callouts outperform other approaches when they employ simple syntactic transformation and backoff strategies. Active Twitter users evaluated Eddi and found it to be a more efficient and enjoyable way to browse an overwhelming status update feed than the standard chronological interface. ACM Classification: H5.2. Information interfaces and presentation (e.g., HCI): User interfaces.
Facetedpedia: Dynamic Generation of Query-Dependent Faceted Interfaces for Wikipedia
"... This paper proposes Facetedpedia, a faceted retrieval system for information discovery and exploration in Wikipedia. Given the set of Wikipedia articles resulting from a keyword query, Facetedpedia generates a faceted interface for navigating the result articles. Compared with other faceted retrieva ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper proposes Facetedpedia, a faceted retrieval system for information discovery and exploration in Wikipedia. Given the set of Wikipedia articles resulting from a keyword query, Facetedpedia generates a faceted interface for navigating the result articles. Compared with other faceted retrieval systems, Facetedpedia is fully automatic and dynamic in both facet generation and hierarchy construction, and the facets are based on the rich semantic information from Wikipedia. The essence of our approach is to build upon the collaborative vocabulary in Wikipedia, more specifically the intensive internal structures (hyperlinks) and folksonomy (category system). Given the sheer size and complexity of this corpus, the space of possible choices of faceted interfaces is prohibitively large. We propose metrics for ranking individual facet hierarchies by user’s navigational cost, and metrics for ranking interfaces (each with
Identifying Content for Planned Events Across Social Media Sites
"... User-contributed Web data contains rich and diverse information about a variety of events in the physical world, such as shows, festivals, conferences and more. This information ranges from known event features (e.g., title, time, location) posted on event aggregation platforms (e.g., Last.fm events ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
User-contributed Web data contains rich and diverse information about a variety of events in the physical world, such as shows, festivals, conferences and more. This information ranges from known event features (e.g., title, time, location) posted on event aggregation platforms (e.g., Last.fm events, EventBrite, Facebook events) to discussions and reactions related to events shared on different social media sites (e.g., Twitter, YouTube, Flickr). In this paper, we focus on the challenge of automatically identifying user-contributed content for events that are planned and, therefore, known in advance, across different social media sites. We mine event aggregation platforms to extract event features, which are often noisy or missing. We use these features to develop query formulation strategies for retrieving content associated with an event on different social media sites. Further, we explore ways in which event content identified on one social media site can be used to retrieve additional relevant event content on other social media sites. We apply our strategies to a large set of user-contributed events, and analyze their effectiveness in retrieving relevant event content from Twitter, YouTube, and Flickr.
Reinventing the web browser for the semantic web
- in WIRSS’09: Proc. of the WIRSS Workshop at the IEEE/WIC/ACM International Conferencie on Web Intelligence. IEEE Computer Society
"... Abstract—The paper extends the traditional browser concept with a Semantic Web tailored faceted browser thus providing integrated end-user grade support for both legacy Web and Semantic Web content. The new browser provides interactive exploratory search and navigation capabilities as well as user a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—The paper extends the traditional browser concept with a Semantic Web tailored faceted browser thus providing integrated end-user grade support for both legacy Web and Semantic Web content. The new browser provides interactive exploratory search and navigation capabilities as well as user adaptation and personalization. We describe the operation, usage scenarios and dependencies of the browser. Keywords-personalization; faceted exploration; semantic web I.
NLP Support for Faceted Navigation in Scholarly Collections
"... Hierarchical faceted metadata is a proven and popular approach to organizing information for navigation of information collections. More recently, digital libraries have begun to adopt faceted navigation for collections of scholarly holdings. A key impediment to further adoption is the need for the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Hierarchical faceted metadata is a proven and popular approach to organizing information for navigation of information collections. More recently, digital libraries have begun to adopt faceted navigation for collections of scholarly holdings. A key impediment to further adoption is the need for the creation of subject-oriented faceted metadata. The Castanet algorithm was developed for the purpose of (semi) automated creation of such structures. This paper describes the application of Castanet to journal title content, and presents an evaluation suggesting its efficacy. This is followed by a discussion of areas for future work. 1
A Rank-Rewrite Framework for Summarizing XML Documents
"... Abstract — With XML becoming a standard for data representation and exchange, we can expect to see large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, bird’s eye view of the available data is a necessity. In this pa ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — With XML becoming a standard for data representation and exchange, we can expect to see large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, bird’s eye view of the available data is a necessity. In this paper, we are interested in semantic XML document summaries which present the “important ” information available in an XML document to the user. In the best case, such a summary is a concise replacement for the original document itself. At the other extreme, it should at least help the user make an informed choice as to the relevance of the document to his needs. In this paper, we address the three main issues which arise in producing such meaningful and concise summaries: i) which tags or text units are important and should be included in the summary, ii) how can the selected tags and text be presented in a concise and coherent manner? and iii) how to generate a semantic summary for different memory budgets? We conduct user studies with different real-life datasets and show that our methods are useful and effective in practice. I.
Generating Concise and Readable Summaries of XML Documents
, 2009
"... XML has become the de-facto standard for data representation and exchange, resulting in large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, bird’s eye view of the available data is a necessity. In this paper, we are ..."
Abstract
- Add to MetaCart
XML has become the de-facto standard for data representation and exchange, resulting in large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, bird’s eye view of the available data is a necessity. In this paper, we are interested in semantic XML document summaries which present the “important ” information available in an XML document to the user. In the best case, such a summary is a concise replacement for the original document itself. At the other extreme, it should at least help the user make an informed choice as to the relevance of the document to his needs. In this paper, we address the two main issues which arise in producing such meaningful and concise summaries: i) which tags or text units are important and should be included in the summary, ii) how to generate summaries of different sizes.We conduct user studies with different real-life datasets and show that our
STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching
"... Abstract. Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving singleword or multiple-word names for the clusters (usually referred as cluster labeling) is di ..."
Abstract
- Add to MetaCart
Abstract. Results clustering in Web Searching is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. However, the task of deriving singleword or multiple-word names for the clusters (usually referred as cluster labeling) is difficult, because they have to be syntactically correct and predictive. Moreover efficiency is an important requirement since results clustering is an online task. Suffix Tree Clustering (STC) is a clustering technique where search results (mainly snippets) can be clustered fast (in linear time), incrementally, and each cluster is labeled with a phrase. In this paper we introduce: (a) a variation of the STC, called STC+, with a scoring formula that favors phrases that occur in document titles and differs in the way base clusters are merged, and (b) a novel algorithm called NM-STC that results in hierarchically organized clusters. The comparative user evaluation showed that both STC+ and NM-STC are significantly more preferred than STC, and that NM-STC is about two times faster than STC and STC+. 1
Exploratory Web Searching with Dynamic Taxonomies and Results Clustering
"... Abstract. This paper proposes exploiting both explicit and mined metadata for enriching Web searching with exploration services. On-line results clustering is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. On the other ha ..."
Abstract
- Add to MetaCart
Abstract. This paper proposes exploiting both explicit and mined metadata for enriching Web searching with exploration services. On-line results clustering is useful for providing users with overviews of the results and thus allowing them to restrict their focus to the desired parts. On the other hand, the various metadata that are available to a WSE (Web Search Engine), e.g. domain/language/date/filetype, are commonly exploited only through the advanced (form-based) search facilities that some WSEs offer (and users rarely use). We propose an approach that combines both kinds of metadata by adopting the interaction paradigm of dynamic taxonomies and faceted exploration. This combination results to an effective, flexible and efficient exploration experience. 1
Disambiguation of Keyword Search Results on Highly Heterogeneous Structured Data
, 2010
"... Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple re ..."
Abstract
- Add to MetaCart
Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogenous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to exactly capture the user’s intent. To address these limitations, we propose a system, referred to herein as WikiAnalytics, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WikiAnalytics uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. Since the system cannot predict which combination of features will be important to the user, it efficiently generates all possible clusters of records by all sets of features. We utilize a novel data structure, universal navigational lattice (UNL), that compactly encodes all possible clusters. WikiAnalytics provides a dynamic and intuitive interface that lets the user explore the UNL and construct homogeneous structured tables,

