• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 447
Next 10 →

Agglomerative Clustering of a Search Engine Query Log

by Doug Beeferman, Adam Berger - In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2000
"... This paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is "clickthrough data": each record consists of a user's query to a search engine along wi ..."
Abstract - Cited by 330 (0 self) - Add to MetaCart
This paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is "clickthrough data": each record consists of a user's query to a search engine along

Measuring semantic similarity between words using web search engines

by Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka - in Proceedings of WWW , 2007
"... Measuring the semantic similarity between words is an important component in various semantic web-related applications such as community mining, relation extraction and au-tomatic meta data extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring ..."
Abstract - Cited by 138 (7 self) - Add to MetaCart
search engine. We define various similarity scores for two given words P and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using automatically extracted lexical-syntactic patterns from text snippets. These different

Can social bookmarking improve web search

by Paul Heymann, Georgia Koutrika, Hector Garcia-molina - in Proceedings of the International Conference on Web Search and Web Data Mining (WSDM'08), ACM
"... Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the lar ..."
Abstract - Cited by 139 (5 self) - Add to MetaCart
Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe

Sindice.com: A document-oriented lookup index for open linked data

by Eyal Oren, Renaud Delbru, Michele Catasta, Richard Cyganiak, Giovanni Tummarello - International Journal of Metadata, Semantics and Ontologies
"... Developers of Semantic Web applications face a challenge with respect to the decentralised publication model: how and where to find statements about encountered resources. The “linked data” approach mandates that resource URIs should be de-referenced to return resource metadata. But for data discove ..."
Abstract - Cited by 130 (12 self) - Add to MetaCart
discovery linkage itself is not enough, and crawling and indexing of data is necessary. Existing Semantic Web search engines are focused on database-like functionality, compromising on index size, query performance and live updates. We present Sindice, a lookup index over resources crawled on the Semantic

Clustering Technique on Search Engine Dataset using Data Mining Tool

by Md. Ezaz Ahmed
"... Abstract- Unlabeled document collections are becoming increasingly common and mining such databases becomes a major challenge. It is a major issue to retrieve good websites from the larger collections of websites. As the number of available Web pages grows, it is become more difficult for users find ..."
Abstract - Add to MetaCart
websites in groups. This paper addresses the applications of data mining tool Weka by applying k means clustering to find clusters from huge data sets and find the attributes that govern optimization of search engines. Keywords—Dataset; Websites; Data mining; Weka; k-means I.

AUTOMATIC GENERATION OF A LARGE SCALE SEMANTIC SEARCH EVALUATION DATA-SET

by Uladzimir Kharkevich , 2009
"... compare the performance of information retrieval tech-niques in various settings, the data-sets which model these settings need to be generated. Although there are already available collections, such as those used in TREC conference series, which are used for evalua-tion of various retrieval tasks, ..."
Abstract - Add to MetaCart
, there is a lack of collections which are specially developed for evaluation of the effectiveness of semantically en-hanced text retrieval techniques. In this paper, we propose an approach for the automatic generation of such data-sets, by using search engines query logs and data from human-edited web

YARS2: A federated repository for querying graph structured data from the Web

by Andreas Harth, Aidan Hogan, Stefan Decker - In ISWC , 2007
"... Abstract. We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answer-ing over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing meth-ods for graph-str ..."
Abstract - Cited by 112 (11 self) - Add to MetaCart
Abstract. We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answer-ing over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing meth-ods for graph

The Sindice-2011 dataset for entity-oriented search in the web of data

by Stéphane Campinas, Diego Ceccarelli, Thomas E. Perry, Renaud Delbru, Krisztian Balog, Giovanni Tummarello - In Balog et al
"... The task of entity retrieval becomes increasingly prevalent as more and more (semi-) structured information about objects is available on the Web in the form of documents embedding metadata (RDF, RDFa, Microformats, and others). However, research and development in that direction is dependent on (1) ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
) the availability of a representative corpus of entities that are found on the Web, and (2) the availability of an entity-oriented search infrastructure for experimenting with new retrieval models. In this paper, we introduce the Sindice-2011 data collection which is derived from data collected by the Sindice

ALIAS: Author Disambiguation in Microsoft Academic Search Engine Dataset

by Michael Pitts, Swapna Savvana, Senjuti Basu Roy, Vani M, Dhineshkumar Prasath
"... We present a system called ALIAS, that is designed to search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or dif-ferent authors share similar or same n ..."
Abstract - Add to MetaCart
We present a system called ALIAS, that is designed to search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or dif-ferent authors share similar or same

Towards Efficient Data Search and Subsetting of Large-scale Atmospheric Datasets

by Sangmi Lee Pallickara, Shrideep Pallickara, Milija Zupanski
"... Abstract — Discovering the correct dataset in an efficient way is critical for effective simulations in atmospheric sciences. Compared to text-based web documents, many of the large scientific datasets contain binary or numerically encoded data that is hard to discover through the popular search eng ..."
Abstract - Add to MetaCart
engines. In the atmospheric sciences, there has been a significant growth in public data hosting. However, the ability to index and search has been limited by the metadata provided by the data host. We have developed an infrastructure – Atmospheric Data Discovery System (ADDS) – that provides
Next 10 →
Results 1 - 10 of 447
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2016 The Pennsylvania State University