Results 1 - 10
of
447
Agglomerative Clustering of a Search Engine Query Log
- In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... This paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is "clickthrough data": each record consists of a user's query to a search engine along wi ..."
Abstract
-
Cited by 330 (0 self)
- Add to MetaCart
This paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is "clickthrough data": each record consists of a user's query to a search engine along
Measuring semantic similarity between words using web search engines
- in Proceedings of WWW
, 2007
"... Measuring the semantic similarity between words is an important component in various semantic web-related applications such as community mining, relation extraction and au-tomatic meta data extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring ..."
Abstract
-
Cited by 138 (7 self)
- Add to MetaCart
search engine. We define various similarity scores for two given words P and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using automatically extracted lexical-syntactic patterns from text snippets. These different
Can social bookmarking improve web search
- in Proceedings of the International Conference on Web Search and Web Data Mining (WSDM'08), ACM
"... Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the lar ..."
Abstract
-
Cited by 139 (5 self)
- Add to MetaCart
Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe
Sindice.com: A document-oriented lookup index for open linked data
- International Journal of Metadata, Semantics and Ontologies
"... Developers of Semantic Web applications face a challenge with respect to the decentralised publication model: how and where to find statements about encountered resources. The “linked data” approach mandates that resource URIs should be de-referenced to return resource metadata. But for data discove ..."
Abstract
-
Cited by 130 (12 self)
- Add to MetaCart
discovery linkage itself is not enough, and crawling and indexing of data is necessary. Existing Semantic Web search engines are focused on database-like functionality, compromising on index size, query performance and live updates. We present Sindice, a lookup index over resources crawled on the Semantic
Clustering Technique on Search Engine Dataset using Data Mining Tool
"... Abstract- Unlabeled document collections are becoming increasingly common and mining such databases becomes a major challenge. It is a major issue to retrieve good websites from the larger collections of websites. As the number of available Web pages grows, it is become more difficult for users find ..."
Abstract
- Add to MetaCart
websites in groups. This paper addresses the applications of data mining tool Weka by applying k means clustering to find clusters from huge data sets and find the attributes that govern optimization of search engines. Keywords—Dataset; Websites; Data mining; Weka; k-means I.
AUTOMATIC GENERATION OF A LARGE SCALE SEMANTIC SEARCH EVALUATION DATA-SET
, 2009
"... compare the performance of information retrieval tech-niques in various settings, the data-sets which model these settings need to be generated. Although there are already available collections, such as those used in TREC conference series, which are used for evalua-tion of various retrieval tasks, ..."
Abstract
- Add to MetaCart
, there is a lack of collections which are specially developed for evaluation of the effectiveness of semantically en-hanced text retrieval techniques. In this paper, we propose an approach for the automatic generation of such data-sets, by using search engines query logs and data from human-edited web
YARS2: A federated repository for querying graph structured data from the Web
- In ISWC
, 2007
"... Abstract. We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answer-ing over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing meth-ods for graph-str ..."
Abstract
-
Cited by 112 (11 self)
- Add to MetaCart
Abstract. We present the architecture of an end-to-end semantic search engine that uses a graph data model to enable interactive query answer-ing over structured and interlinked data collected from many disparate sources on the Web. In particular, we study distributed indexing meth-ods for graph
The Sindice-2011 dataset for entity-oriented search in the web of data
- In Balog et al
"... The task of entity retrieval becomes increasingly prevalent as more and more (semi-) structured information about objects is available on the Web in the form of documents embedding metadata (RDF, RDFa, Microformats, and others). However, research and development in that direction is dependent on (1) ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
) the availability of a representative corpus of entities that are found on the Web, and (2) the availability of an entity-oriented search infrastructure for experimenting with new retrieval models. In this paper, we introduce the Sindice-2011 data collection which is derived from data collected by the Sindice
ALIAS: Author Disambiguation in Microsoft Academic Search Engine Dataset
"... We present a system called ALIAS, that is designed to search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or dif-ferent authors share similar or same n ..."
Abstract
- Add to MetaCart
We present a system called ALIAS, that is designed to search for duplicate authors from Microsoft Academic Search Engine dataset. Author-ambiguity is a prevalent problem in this dataset, as many authors publish under several variations of their own name, or dif-ferent authors share similar or same
Towards Efficient Data Search and Subsetting of Large-scale Atmospheric Datasets
"... Abstract — Discovering the correct dataset in an efficient way is critical for effective simulations in atmospheric sciences. Compared to text-based web documents, many of the large scientific datasets contain binary or numerically encoded data that is hard to discover through the popular search eng ..."
Abstract
- Add to MetaCart
engines. In the atmospheric sciences, there has been a significant growth in public data hosting. However, the ability to index and search has been limited by the metadata provided by the data host. We have developed an infrastructure – Atmospheric Data Discovery System (ADDS) – that provides
Results 1 - 10
of
447