Results 1 -
8 of
8
Improving Web Search Efficiency via a Locality Based Static Pruning Method
- In Proceedings of the 14th International Conference on World Wide Web
, 2005
"... The unarguably fast, and continuous, growth of the volume of indexed (and indexable) documents on the Web poses a great challenge for search engines. This is true regarding not only search e#ectiveness but also time and space e#- ciency. In this paper we present an index pruning technique targeted f ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The unarguably fast, and continuous, growth of the volume of indexed (and indexable) documents on the Web poses a great challenge for search engines. This is true regarding not only search e#ectiveness but also time and space e#- ciency. In this paper we present an index pruning technique targeted for search engines that addresses the latter issue without disconsidering the former. To this e#ect, we adopt a new pruning strategy capable of greatly reducing the size of search engine indices. Experiments using a real search engine show that our technique can reduce the indices' storage costs by up to 60% over traditional lossless compression methods, while keeping the loss in retrieval precision to a minimum. When compared to the indices size with no compression at all, the compression rate is higher than 88%, i.e., less than one eighth of the original size. More importantly, our results indicate that, due to the reduction in storage overhead, query processing time can be reduced to nearly 65% of the original time, with no loss in average precision. The new method yields significative improvements when compared against the best known static pruning method for search engine indices. In addition, since our technique is orthogonal to the underlying search algorithms, it can be adopted by virtually any search engine.
Legal Information Retrieval and Application on E-Rulemaking
- In Proceedings of the 10th International Conference on Artificial Intelligence and Law (ICAIL 2005
"... The complexity and diversity of government regulations make understanding the regulations a non-trivial task. One of the issues is the existence of multiple sources of regulations and interpretive guides; the latter are often independent of governing bodies. This work aims to develop an information ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The complexity and diversity of government regulations make understanding the regulations a non-trivial task. One of the issues is the existence of multiple sources of regulations and interpretive guides; the latter are often independent of governing bodies. This work aims to develop an information infrastructure for legal information retrieval with applications on electronic-rulemaking. The pilot study focuses on the accessibility regulations from the US Federal government and European organizations. A shallow parser is developed to consolidate different regulations into a unified XML format, which is well suited for handling semi-structured data such as legal documents. Handcrafted rules and a text mining tool are developed to extract the important features, such as concepts, measurements, definitions and so on, and to incorporate them into the corpus.
Relatedness analysis approach for regulation comparison and e-rulemaking applications
- Applications, Proceedings of the DG.O Meeting. Proceedings of the 8th Annual International Digital Government Research Conference
, 2005
"... The process of e-rulemaking with participation from the public involves a non-trivial task of sorting through and organizing a massive volume of electronically submitted comments. This research proposes to make use of available Information and Communication Technology (ICT) to help describe the rela ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The process of e-rulemaking with participation from the public involves a non-trivial task of sorting through and organizing a massive volume of electronically submitted comments. This research proposes to make use of available Information and Communication Technology (ICT) to help describe the relationship of public comments to policy drafts and deliberations. Based on previous work on regulatory management and comparisons, a relatedness analysis tool has been prototyped and applied to compare drafted regulations with the associated public comments. An example using a drafted regulation on rights-ofway access and the comments received by the Access Board is employed to illustrate the prototyped analysis tool. The drafted regulation and public comments are compared using not only a traditional term match but also a combination of feature matches, and not only content comparison but also structural analysis. This comparison framework helps review of comments with respect to provisions in the draft. Examples of results are shown to illustrate the use and limitations of ICT to support policy making.
A Hypergraph Model for Computing Page Reputation on Web Collections
"... Abstract. In this work we propose a representation of the web as a directed hypergraph, instead of a graph, where links can connect not only pairs of pages, but also pairs of disjoint sets of pages. In our model, the web hypergraph is derived from the web graph by dividing the set of pages into non- ..."
Abstract
- Add to MetaCart
Abstract. In this work we propose a representation of the web as a directed hypergraph, instead of a graph, where links can connect not only pairs of pages, but also pairs of disjoint sets of pages. In our model, the web hypergraph is derived from the web graph by dividing the set of pages into non-overlapping blocks and using the links between pages of distinct blocks to create hyperarcs. Each hyperarc connects a block of pages to a single page and is created with the goal of providing more reliable information for link analysis methods. We used the hypergraph structure to compute the reputation of web pages by experimenting hypergraph versions of two previously proposed link analysis methods, Pagerank and Indegree. We present experiments which indicate the hypergraph versions of Pagerank and Indegree produce better results when compared to their original graph versions. 1.
Using Association Rules to Discover Search Engines Related Queries
- In LA-WEB
, 2003
"... This work presents a method for automatic generate suggestions of related queries submitted to Web search engines. The method extracts information from the log of past submitted queries to search engines using algorithms for mining association rules. Experimental results were performed on a log con ..."
Abstract
- Add to MetaCart
This work presents a method for automatic generate suggestions of related queries submitted to Web search engines. The method extracts information from the log of past submitted queries to search engines using algorithms for mining association rules. Experimental results were performed on a log containing more than 2.3 million queries submitted to a commercial searching engine giving correct suggestions in 90.5% of the top 5 suggestions presented for common queries extracted from a real log.
A Probabilistic Approach for Automatically Filling Form-Based Web Interfaces
"... In this paper we present a proposal for the implementation and evaluation of a novel method for automatically using data-rich text for filling form-based input interfaces. Our solution takes a text as input, extracts implicit data values from it and fills appropriate fields. For this task, we rely o ..."
Abstract
- Add to MetaCart
In this paper we present a proposal for the implementation and evaluation of a novel method for automatically using data-rich text for filling form-based input interfaces. Our solution takes a text as input, extracts implicit data values from it and fills appropriate fields. For this task, we rely on knowledge obtained from values of previous submissions for each field, which are freely obtained from the usage of the interfaces. Our approach, called iForm, exploits features related to the content and the style of these values, which are combined through a Bayesian framework. Through extensive experimentation, we show that our approach is feasible and effective, and that it works well even when only a few previous submissions to the input interface are available. 1.
An Improved Page Rank Algorithm based on Optimized Normalization Technique
"... Abstract — Page Ranking is an important component for information retrieval system. It is used to measure the importance and behavior of web pages. We review two approaches for ranking: HITS concept and Page Rank method. Both approaches focus on the link structure of the Web to find the importance o ..."
Abstract
- Add to MetaCart
Abstract — Page Ranking is an important component for information retrieval system. It is used to measure the importance and behavior of web pages. We review two approaches for ranking: HITS concept and Page Rank method. Both approaches focus on the link structure of the Web to find the importance of the Web pages. The Page Rank algorithm calculates the rank of individual web page and Hypertext Induced Topic Search (HITS) depends upon the hubs and authority framework. A fast and efficient page ranking mechanism for web retrieval remains as a challenge. This paper proposed a new page rank algorithm which uses a normalization technique based on mean value of page ranks. The proposed scheme reduces the time complexity of the traditional Page Rank algorithm by reducing the number of iterations to reach a convergence point. the quality of the web page. Thus Web structure mining focuses on the hyperlink structure of the web. We review two approaches: HITS concept and Page Rank method. Both approaches focus on the link structure of the Web to find the importance of the Web pages. Mainly In links to the pages and out links from the page can give idea about the context of the page. PageRank does not rank web sites as a whole, but it calculates the rank of individual web page and Hypertext Induced Topic Search (HITS) depends upon the hubs and authority framework.

