Results 1 -
5 of
5
Expressive and Flexible Access to Web-Extracted Data: A Keyword-based Structured Query Language
- In SIGMOD ’10: Proceedings of International Conference on Management of Data
, 2010
"... Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer si ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Automated extraction of structured data from Web sources often leads to large heterogeneous knowledge bases (KB), with data and schema items numbering in the hundreds of thousands or millions. Formulating information needs with conventional structured query languages is difficult due to the sheer size of schema information available to the user. We address this challenge by proposing a new query language that blends keyword search with structured query processing over large information graphs with rich semantics.
Individual and social behavior in tagging systems
- In 20th ACM Conference on Hypertext and Hypermedia
, 2009
"... In tagging systems users can annotate items of interest with freeform terms. A good understanding of the usage characteristics of such systems is necessary to improve the design of current and next generation tagging systems. To this end, this work explores three aspects of user behavior in CiteULik ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In tagging systems users can annotate items of interest with freeform terms. A good understanding of the usage characteristics of such systems is necessary to improve the design of current and next generation tagging systems. To this end, this work explores three aspects of user behavior in CiteULike and Connotea, two systems that include tagging features to support online personalized management of scientific publications. First, this study characterizes the degree to which users re-tag previously published items and reuse tags: 10 to 20 % of the daily activity can be characterized as re-tagging and about 75 % of the activity as tag reuse. Second, we use the pairwise similarity between users ’ activity to characterize the interest sharing in these systems. We present the interest sharing distribution across the systems, show that this metric encodes information about existing usage patterns, and attempt to correlate interest sharing levels to indicators of collaboration such as co-membership in discussion groups and semantic similarity of tag vocabularies. Finally, we show that interest sharing leads to an implicit structure that exhibits a natural segmentation. Throughout the paper we discuss the potential impact of our findings on the design of mechanisms that support tagging systems.
Evaluation of sentence selection for speech summarization
- in Proceedings of RANLP workshop on Crossing Barriers in Text Summarization Research
, 2005
"... In the last several years, a number of papers have addressed the area of automatic speech summarization. Many of them have applied evaluation metrics adapted from those used in speech recognition research, rather than from those used in text summarization. We consider whether ASR-inspired evaluation ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In the last several years, a number of papers have addressed the area of automatic speech summarization. Many of them have applied evaluation metrics adapted from those used in speech recognition research, rather than from those used in text summarization. We consider whether ASR-inspired evaluation metrics produce different results than those taken from text summarization, and why. We evaluate various standard summarizers as well as our own systems on a subset of the SWITCHBOARD spoken dialogue dataset with both kinds of metrics. We find a statistically significant departure between the two classes in their relative rank of
QUICK: Queries Using Inferred Concepts from
"... We present QUICK, an entity-based text search engine that blends keyword search with structured query processing over rich knowledge bases (KB) with massive schemas. We introduce a new formalism for ..."
Abstract
- Add to MetaCart
We present QUICK, an entity-based text search engine that blends keyword search with structured query processing over rich knowledge bases (KB) with massive schemas. We introduce a new formalism for
Beyond Bag-of-Words: A New Distance Metric for Keywords Extraction and Clustering
"... Bag-of-Words (BoW) is a widely used model in a variety tasks in Natural Language Processing (NLP). However, this model does not consider any relations between words in the bag, which will bring about multiple problems in some NLP aspects. In this project, I proposed a framework for calculating pair- ..."
Abstract
- Add to MetaCart
Bag-of-Words (BoW) is a widely used model in a variety tasks in Natural Language Processing (NLP). However, this model does not consider any relations between words in the bag, which will bring about multiple problems in some NLP aspects. In this project, I proposed a framework for calculating pair-wise word relations within a bag, using both deterministic Wordnet database and stochastic context information. The final relation matrix could be viewed as both state transition matrix and inner product matrix, which will be helpful for both keywords abstraction and clustering tasks commonly seen in meta-search engines.

