Results 1 - 10
of
21
Distributed query processing using partitioned inverted files
- In Proc. of the 9th String Processing and Information Retrieval Symposium (SPIRE
, 2001
"... In this paper, we study query processing in a distributed text database. The novelty is a real distributed architecture implementation that offers concurrent query service. The distributed system adopts a network of workstations model and the client-server paradigm. The document collection is indexe ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
In this paper, we study query processing in a distributed text database. The novelty is a real distributed architecture implementation that offers concurrent query service. The distributed system adopts a network of workstations model and the client-server paradigm. The document collection is indexed with an inverted file. We adopt two distinct strategies of index partitioning in the distributed system, namely local index partitioning and global index partitioning. In both strategies, documents are ranked using the vector space model along with a document filtering technique for fast ranking. We evaluate and compare the impact of the two index partitioning strategies on query processing performance. Experimental results on retrieval efficiency show that, within our framework, the global index partitioning outperforms the local index partitioning. 1.
Automatic Learning of User Profiles - Towards the Personalisation of Agent Services
, 1998
"... This paper describes experimental work conducted to investigate user profiling within a framework for personal agents. In particular, investigations were aimed at discovering whether user interests could be automatically classified through the use of several heuristics. The results highlighted the n ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
This paper describes experimental work conducted to investigate user profiling within a framework for personal agents. In particular, investigations were aimed at discovering whether user interests could be automatically classified through the use of several heuristics. The results highlighted the need for minimal user feedback, and the need to consider the implications for the role of machine learning in user profiling. 1. Introduction This paper focuses upon the role of user profiling within the context of personal agents. We consider personal agents to be software capable of operating autonomously in order to provide timely and relevant information for an individual. Examples
Discovering Unexpected Information from Your Competitors' Web Sites
- In Proceedings of ACM SIG KDD-2001
, 2001
"... Ever since the beginning of the Web, finding useful information from the Web has been an important problem. Existing approaches include keyword-based search, wrapper-based information extraction, Web query and user preferences. These approaches essentially find information that matches the user's ex ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Ever since the beginning of the Web, finding useful information from the Web has been an important problem. Existing approaches include keyword-based search, wrapper-based information extraction, Web query and user preferences. These approaches essentially find information that matches the user's explicit specifications. This paper argues that this is insufficient. There is another type of information that is also of great interest, i.e., unexpected information, which is unanticipated by the user. Finding unexpected information is useful in many applications. For example, it is useful for a company to find unexpected information about its competitors, e.g., unexpected services and products that its competitors offer. With this information, the company can learn from its competitors and/or design counter measures to improve its competitiveness. Since the number of pages of a typical commercial site is very large and there are also many relevant sites (competitors), it is very difficult for a human user to view each page to discover the unexpected information. Automated assistance is needed. In this paper, we propose a number of methods to help the user find various types of unexpected information from his/her competitors' Web sites. Experiment results show that these techniques are very useful in practice and also efficient. Keywords Information interestingness, Web comparison, Web mining. 1.
Telcordia LSI Engine: Implementation and Scalability Issues
- In Proceedings of the Eleventh International Workshop on Research Issues in Data Engineering
, 2001
"... Latent Semantic Indexing (LSI), a vector spacebased approach to information retrieval , has been proven to be an effective tool in correlating and retrieving relevant documents. While much work has been published on LSI, most of it addresses the algorithmic or theoretical basis of the model. Little, ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Latent Semantic Indexing (LSI), a vector spacebased approach to information retrieval , has been proven to be an effective tool in correlating and retrieving relevant documents. While much work has been published on LSI, most of it addresses the algorithmic or theoretical basis of the model. Little, if any, presents implementation issues in practice. In this paper, we describe a production-level implementation of LSL The system integrates components including document collection and preprocessing, singular value decomposition (SVD), multilingual processing, and a tree-based access method for similarity querying. We discuss implementation issues encountered during the development of the system. In particular, we address scalability issues in the query engine and various components of the system, and present lessons learned.
Set-based vector model: An efficient approach for correlation-based ranking
- ACM Transactions on Information Systems
, 2005
"... This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and are processed efficiently. Second, term weights are generated using a data mining technique called association rules. Th ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and are processed efficiently. Second, term weights are generated using a data mining technique called association rules. This leads to a new ranking mechanism called the set-based vector model. The components of our model are no longer index terms but index termsets, where a termset is a set of index terms. Termsets capture the intuition that semantically related terms appear close to each other in a document. They can be efficiently obtained by limiting the computation to small passages of text. Once termsets have been computed, the ranking is calculated as a function of the termset frequency in the document and its scarcity in the document collection. Experimental results show that the set-based vector model improves average precision for all collections and query types evaluated, while keeping computational costs small. For the 2 gigabyte TREC-8 collection, the set-based vector model leads to a gain in average precision figures of 14.7 % and 16.4 % for disjunctive and conjunctive queries, respectively, with respect to the standard vector space model. These gains increase to 24.9 % and 30.0%, respectively, when proximity information is taken into account. Query processing times are larger but, on average, still comparable to those obtained
An approach for combining content-based and collaborative filters
- in Proceedings of the Sixth international workshop on Information retrieval with Asian languages (ACL-2003
, 2003
"... In this work, we apply a clustering technique to integrate the contents of items into the item-based collaborative filtering framework. The group rating information that is obtained from the clustering result provides a way to introduce content information into collaborative recommendation and solve ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this work, we apply a clustering technique to integrate the contents of items into the item-based collaborative filtering framework. The group rating information that is obtained from the clustering result provides a way to introduce content information into collaborative recommendation and solves the cold start problem. Extensive experiments have been conducted on MovieLens data to analyze the characteristics of our technique. The results show that our approach contributes to the improvement of prediction quality of the item-based collaborative filtering, especially for the cold start problem. 1
Legal Information Retrieval and Application on E-Rulemaking
- In Proceedings of the 10th International Conference on Artificial Intelligence and Law (ICAIL 2005
"... The complexity and diversity of government regulations make understanding the regulations a non-trivial task. One of the issues is the existence of multiple sources of regulations and interpretive guides; the latter are often independent of governing bodies. This work aims to develop an information ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The complexity and diversity of government regulations make understanding the regulations a non-trivial task. One of the issues is the existence of multiple sources of regulations and interpretive guides; the latter are often independent of governing bodies. This work aims to develop an information infrastructure for legal information retrieval with applications on electronic-rulemaking. The pilot study focuses on the accessibility regulations from the US Federal government and European organizations. A shallow parser is developed to consolidate different regulations into a unified XML format, which is well suited for handling semi-structured data such as legal documents. Handcrafted rules and a text mining tool are developed to extract the important features, such as concepts, measurements, definitions and so on, and to incorporate them into the corpus.
Identifying Facts for TCBR
- In Weber, R and Branting, LK (eds) Proceedings of the Textual Case-Based Reasoning Workshop
, 2005
"... Abstract. This paper explores a method to algorithmically distinguish case-specific facts from potentially reusable or adaptable elements of cases in a textual case-based reasoning (TCBR) system. In the legal domain, documents often contain casespecific facts mixed with case-neutral details of law, ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. This paper explores a method to algorithmically distinguish case-specific facts from potentially reusable or adaptable elements of cases in a textual case-based reasoning (TCBR) system. In the legal domain, documents often contain casespecific facts mixed with case-neutral details of law, precedent, conclusions the attorneys reach by applying their interpretation of the law to the case facts, and other aspects of argumentation that attorneys could potentially apply to similar situations. The automated distinction of these two categories, namely facts and other elements, has the potential to improve quality of automated textual case acquisition. The goal is ultimately to distinguish case problem from solution. To separate fact from other elements, we use an information gain (IG) algorithm to identify words that serve as efficient markers of one or the other. We demonstrate that this technique can successfully distinguish case-specific fact paragraphs from others, and propose future work to overcome some of the limitations of this pilot project. 1
XML Multimedia Retrieval
"... Abstract. Multimedia XML documents can be viewed as a tree, whose nodes correspond to XML elements, and where multimedia objects are referenced in attributes as external entities. This paper investigates the use of textual XML elements for retrieving multimedia objects. 1 ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Multimedia XML documents can be viewed as a tree, whose nodes correspond to XML elements, and where multimedia objects are referenced in attributes as external entities. This paper investigates the use of textual XML elements for retrieving multimedia objects. 1
Maitre. Enhancement of textual images classification using segmented visual contents for image search engine
- Multimedia Tools and Applications
, 2005
"... ..."

