Results 1 - 10
of
15
Building efficient and effective metasearch engines
- ACM Computing Surveys
, 2002
"... Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a met ..."
Abstract
-
Cited by 107 (9 self)
- Add to MetaCart
Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.
An interactive clustering-based approach to integrating source query interfaces on the deep web
- In SIGMOD
, 2004
"... An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these s ..."
Abstract
-
Cited by 73 (14 self)
- Add to MetaCart
An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these sources, we consider the integration of their query interfaces. More specifically, we focus on the crucial step of the integration: accurately matching the interfaces. While the integration of query interfaces has received more attentions recently, current approaches are not sufficiently general: (a) they all model interfaces with flat schemas; (b) most of them only consider 1:1 mappings of fields over the interfaces; (c) they all perform the integration in a blackbox-like fashion and the whole process has to be restarted from scratch if anything goes wrong; and (d) they often require laborious parameter tuning. In this paper, we propose an interactive, clustering-based approach to matching query interfaces. The hierarchical nature of interfaces is captured with ordered trees. Varied types of complex mappings of fields are examined and several approaches are proposed to effectively identify these mappings. We put the human integrator back in the loop and propose several novel approaches to the interactive learning of parameters and the resolution of uncertain mappings. Extensive experiments are conducted and results show that our approach is highly effective. 1.
On the Automatic Extraction of Data from the Hidden Web Stephen W. Li9RU
- In Proceedings of the International Workshop on Data Semantics in Web Information Systems (DASWIS-2001
, 2001
"... An increasing amount of Web data is accessible only by filling out HTML forms to query an underlying data source. While this is most welcome from a user perspective (queries are easy and precise) and from a data management perspective (static pages need not be maintained; databases can be accessed d ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
An increasing amount of Web data is accessible only by filling out HTML forms to query an underlying data source. While this is most welcome from a user perspective (queries are easy and precise) and from a data management perspective (static pages need not be maintained; databases can be accessed directly), automated agents have greater difficulty accessing data behind forms. In this paper we present a method for automatically filling in forms to retrieve the associated dynamically generated pages. Using our approach automated agents can begin to systematically access portions of the "hidden Web." 1
Reasoning Methods for Personalization on the Semantic Web
- Annals of Mathematics, Computing & Telefinformatics
, 2004
"... The Semantic Web vision of a next generation Web, in which machines are enabled to understand the meaning of information in order to better interoperate and better support humans in carrying out their tasks, is very appealing and fosters the imagination of smarter applications that can retrieve, pro ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
The Semantic Web vision of a next generation Web, in which machines are enabled to understand the meaning of information in order to better interoperate and better support humans in carrying out their tasks, is very appealing and fosters the imagination of smarter applications that can retrieve, process and present information in enhanced ways. In this vision, a particular attention should be devoted to personalization: By bringing the user's needs into the center of interaction processes, personalized Web systems overcome the one-size-fits-all paradigm and provide individually optimized access to Web data and information. In this paper, we provide an overview of recent trends for establishing personalization on the Semantic Web: Based on a discussion on reasoning with rule- and query languages for the Semantic Web, we outline an architecture for service-based personalization, and show results in personalizing Web applications.
A Highly Scalable and Effective Method for Metasearch
"... ... This paper proposes a highly scalable and accurate database selection method. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computation effic ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
... This paper proposes a highly scalable and accurate database selection method. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computation efficiency and storage efficiency. Second, the new selection method is based on a theory for ranking search engines optimally. Experimental results indicate that this new method is very effective. An operational prototype system has been built based on the proposed approach.
FOCUS: Five Rules for Writing a Great WebQuest. Learning & Leading with
- Technology
, 2001
"... Since it was first developed in 1995 by Bernie Dodge with Tom March, the WebQuest model has been incorporated into hundreds of education courses and staff development efforts around the globe (Dodge, 1995). A WebQuest, according to ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Since it was first developed in 1995 by Bernie Dodge with Tom March, the WebQuest model has been incorporated into hundreds of education courses and staff development efforts around the globe (Dodge, 1995). A WebQuest, according to
Modeling and Extracting Deep-Web Query Interfaces
- In Advances in Information and Intelligent Systems
, 2009
"... Abstract. Interface modeling & extraction is a fundamental step in building a uniform query interface to a multitude of databases on the Web. Existing solutions are limited in that they assume interfaces are flat and thus ignore the inherent structure of interfaces, which then seriously hampers the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Interface modeling & extraction is a fundamental step in building a uniform query interface to a multitude of databases on the Web. Existing solutions are limited in that they assume interfaces are flat and thus ignore the inherent structure of interfaces, which then seriously hampers the effectiveness of interface integration. To address this limitation, in this chapter, we model an interface with a hierarchical schema (e.g., an ordered-tree of attributes). We describe ExQ, anovel schema extraction system with two distinct features. First, ExQ discovers the structure of an interface based on its visual representation via spatial clustering. Second, ExQ annotates the discovered schema with labels from the interface by imitating the human-annotation process. ExQ has been extensively evaluated with real-world query interfaces in five different domains and the results show that ExQ achieves above 90 % accuracy rate in both structure discovery & schema annotation tasks. 1
Creating Customized Metasearch Engines on Demand Using SE-LEGO (Extended Abstract)
- In Proceedings of Fourth International Conference on WebAge Information Management (WAIM'03), Demo paper
, 2003
"... Introduction Frequently, the documents needed by a user are available only via multiple search engines. For example, research papers about a particular subject may be found from the search engines of related digital libraries and journals. It is inconvenient for the user to search these search engi ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Introduction Frequently, the documents needed by a user are available only via multiple search engines. For example, research papers about a particular subject may be found from the search engines of related digital libraries and journals. It is inconvenient for the user to search these search engines separately. An effective way to address this problem is to employ a metasearch engine, which is a system that provides unified access to multiple existing search systems. When a metasearch engine receives a user query, it passes the query to its underlying search engines. The results returned by the search engines, are then combined by the metasearch engine to form a single ranked list for presentation to the user [2]. Building customized metasearch engines is important to many people and organizations. For example, a researcher may use a particular set of search engines for finding papers on a particular subject. A customized metasearch engine based on these search engines will provide
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
, 2004
"... In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient mining system for discovering and extracting QAPagelets from the Deep Web. A unique feature of THOR is its two-phase extract ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient mining system for discovering and extracting QAPagelets from the Deep Web. A unique feature of THOR is its two-phase extraction framework. In the first phase, pages from a deep web site are grouped into distinct clusters of structurally-similar pages. In the second phase, pages from each page cluster are examined through a subtree filtering algorithm that exploits the structural and content similarity at subtree level to identify the QA-Pagelets.

