Results 1 - 10
of
82
Automating the Construction of Internet Portals with Machine Learning
- Information Retrieval
, 2000
"... Domain-specific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible ..."
Abstract
-
Cited by 141 (3 self)
- Add to MetaCart
Domain-specific internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, www.campsearch.com allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible with general, Web-wide search engines. Unfortunately these portals are difficult and time-consuming to maintain. This paper advocates the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific Internet portals. We describe new research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies. Using these techniques, we have built a demonstration system: a portal for computer science research papers. It already contains over 50,000 papers and is publicly available at www.cora.justresearch.com. These techniques are ...
Learning Object Identification Rules for Information Integration
- Information Systems
, 2001
"... When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it di#cult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects' ..."
Abstract
-
Cited by 77 (8 self)
- Add to MetaCart
When integrating information from multiple websites, the same data objects can exist in inconsistent text formats across sites, making it di#cult to identify matching objects using exact text match. We have developed an object identification system called Active Atlas, which compares the objects' shared attributes in order to identify matching objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous methods of object identification have required manual construction of object identification rules or mapping rules for determining the mappings between objects. This manual process is time consuming and error-prone.
Indexing and retrieval of scientific literature
- Proceedings of the 8 th International Conference on Information and Knowledge Management
, 1999
"... The web has greatly improved access to scientific literature. However, scientific articles on the web are largely disorganized, with research articles being spread across archive sites, institution sites, journal sites, and researcher homepages. No index covers all of the available literature, and t ..."
Abstract
-
Cited by 68 (14 self)
- Add to MetaCart
The web has greatly improved access to scientific literature. However, scientific articles on the web are largely disorganized, with research articles being spread across archive sites, institution sites, journal sites, and researcher homepages. No index covers all of the available literature, and the major web search engines typically do not index the content of Postscript/PDF documents at all. This paper discusses the creation of digital libraries of scientific literature on the web, including the efficient location of articles, full-text indexing of the articles, autonomous citation indexing, information extraction, display of query-sensitive summaries and citation context, hubs and authorities computation, similar document detection, user profiling, distributed error correction, graph analysis, and detection of overlapping documents. The software for the system is available at no cost for non-commercial use. 1
A Machine Learning Approach to Building Domain-Specific Search Engines
- In Proceedings of the 16th International Joint Conference on Artificial Intelligence
, 1999
"... Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also difficult and time-consuming to maintain. This paper proposes the use of machine learning techniq ..."
Abstract
-
Cited by 68 (3 self)
- Add to MetaCart
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also difficult and time-consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that enables efficient spidering, populates topic hierarchies, and identifies informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers available at www.cora.justresearch.com.
Building Domain-Specific Search Engines with Machine Learning Techniques
, 1999
"... Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by agegroup, size, location and cost over summer camps. Unfortunate ..."
Abstract
-
Cited by 58 (6 self)
- Add to MetaCart
Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by agegroup, size, location and cost over summer camps. Unfortunately, these domain-specific search engines are difficult and time consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that automates efficient spidering, populating topic hierarchies, and identifying informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers. It already contains over 33,000 papers and is publicly available at www.cora.jprc.com. 1 Introduction As the amount of information on the World ...
Capturing Knowledge of User Preferences: Ontologies in Recommender Systems
- IN PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP 2001), OCT 2001
"... Tools for filtering the World Wide Web exist, but they are hampered by the difficulty of capturing user preferences in such a dynamic environment. We explore the acquisition of user profiles by unobtrusive monitoring of browsing behaviour and application of supervised machine-learning techniques cou ..."
Abstract
-
Cited by 54 (7 self)
- Add to MetaCart
Tools for filtering the World Wide Web exist, but they are hampered by the difficulty of capturing user preferences in such a dynamic environment. We explore the acquisition of user profiles by unobtrusive monitoring of browsing behaviour and application of supervised machine-learning techniques coupled with an ontological representation to extract user preferences. A multi-class approach to paper classification is used, allowing the paper topic taxonomy to be utilised during profile construction. The Quickstep recommender system is presented and two empirical studies evaluate it in a real work setting, measuring the effectiveness of using a hierarchical topic ontology compared with an extendable flat list.
Exploiting Hierarchical Domain Structure to Compute Similarity
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2003
"... ..."
Ontological user profiling in recommender systems
- ACM Transactions on Information Systems
, 2004
"... We explore a novel ontological approach to user profiling within recommender systems, working on the problem of recommending on-line academic research papers. Our two experimental systems, Quickstep and Foxtrot, create user profiles from unobtrusively monitored behaviour and relevance feedback, repr ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
We explore a novel ontological approach to user profiling within recommender systems, working on the problem of recommending on-line academic research papers. Our two experimental systems, Quickstep and Foxtrot, create user profiles from unobtrusively monitored behaviour and relevance feedback, representing the profiles in terms of a research paper topic ontology. A novel profile visualization approach is taken to acquire profile feedback. Research papers are classified using ontological classes and collaborative recommendation algorithms used to recommend papers seen by similar people on their current topics of interest. Two small-scale experiments, with 24 subjects over 3 months, and a large-scale experiment, with 260 subjects over an academic year, are conducted to evaluate different aspects of our approach. Ontological inference is shown to improve user profiling, external ontological knowledge used to successfully bootstrap a recommender system and profile visualization employed to improve profiling accuracy. The overall performance of our ontological recommender systems are also presented and favourably compared to other systems in the literature.
VisFlowConnect: NetFlow Visualizations of Link Relationships for Security Situational Awareness
, 2004
"... We present a visualization design to enhance the ability of an administrator to detect and investigate anomalous tra#c between a local network and external domains. Central to the design is a parallel axes view which displays NetFlow records as links between two machines or domains while employing a ..."
Abstract
-
Cited by 44 (9 self)
- Add to MetaCart
We present a visualization design to enhance the ability of an administrator to detect and investigate anomalous tra#c between a local network and external domains. Central to the design is a parallel axes view which displays NetFlow records as links between two machines or domains while employing a variety of visual cues to assist the user. We describe several filtering options that can be employed to hide uninteresting or innocuous tra#c such that the user can focus his or her attention on the more unusual network flows.

