• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 2,195
Next 10 →

Data Corpus: 1 CD By

by Lu Wei, Advisor Dr. Min-yen Kan, Lu Wei
"... Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purposes can often help to interpret or provide crucial information about the web page. I have developed a functionality-based categorization of JavaScript ..."
Abstract - Add to MetaCart
classification performance. I perform experiments on the standard WT10G web page corpus, and show that my techniques eliminate over 50 % of errors over a standard text classification baseline. Subject Descriptors:

Supervised Categorization of JavaScript TM using Program Analysis Features Abstract

by Wei Lu, Min-yen Kan
"... Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionalitybased categorization of JavaScript, ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50 % of errors over a standard text classification baseline.

Ontology Development 101: A Guide to Creating Your First Ontology

by Natalya F. Noy, Deborah L. Mcguinness , 2001
"... In recent years the development of ontologies—explicit formal specifications of the terms in the domain and relations among them (Gruber 1993)—has been moving from the realm of Artificial-Intelligence laboratories to the desktops of domain experts. Ontologies have become common on the World-Wide Web ..."
Abstract - Cited by 830 (5 self) - Add to MetaCart
for encoding knowledge on Web pages to make it understandable to electronic agents searching for information. The Defense Advanced Research Projects Agency (DARPA), in conjunction with the W3C, is developing DARPA Agent Markup Language (DAML) by extending RDF with more expressive constructs aimed

A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood

by Stéphane Guindon, Olivier Gascuel , 2003
"... The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximumlikelihood principle, which clearly satisfies these requirements. The ..."
Abstract - Cited by 2182 (27 self) - Add to MetaCart
of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page:

Inferring Web Communities from Link Topology

by David Gibson, Jon Kleinberg, Prabhakar Raghavan , 1998
"... The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more traditionally-created hypermedia. To extract meaningful structure under such circumstances, we develop a ..."
Abstract - Cited by 415 (4 self) - Add to MetaCart
The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more traditionally-created hypermedia. To extract meaningful structure under such circumstances, we develop

Open information extraction from the web

by Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead, Oren Etzioni - IN IJCAI , 2007
"... Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to ma ..."
Abstract - Cited by 373 (39 self) - Add to MetaCart
page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER

Syskill & Webert: Identifying interesting web sites

by Michael Pazzani, Jack Muramatsu, Daniel Billsus - In Proc. 13th Natl. Conf. on Artificial Intelligence , 1998
"... We describe Syskill & Webert, a software agent that learns to rate pages on the World Wide Web (WWW), deciding what pages might interest a user. The user rates explored pages on a three point scale, and Syskill & Webert learns a user profile by analyzing the information on a page. The user p ..."
Abstract - Cited by 353 (5 self) - Add to MetaCart
We describe Syskill & Webert, a software agent that learns to rate pages on the World Wide Web (WWW), deciding what pages might interest a user. The user rates explored pages on a three point scale, and Syskill & Webert learns a user profile by analyzing the information on a page. The user

Learning dictionaries for information extraction by multi-level bootstrapping

by Ellen Riloff, Rosie Jones - in AAAI’99/IAAI’99 – Proceedings of the 16th National Conference on Artificial Intelligence & 11th Innovative Applications of Artificial Intelligence Conference
"... Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multilevel bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique require ..."
Abstract - Cited by 378 (21 self) - Add to MetaCart
pages and a corpus of terrorism news articles. The algorithm produced high-quality dictionaries for several semantic categories.

Personalizing search via automated analysis of interests and activities

by Jaime Teevan , 2005
"... We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that ..."
Abstract - Cited by 303 (29 self) - Add to MetaCart
that leverage implicit information about the user’s interests. This information is used to re-rank Web search results within a relevance feedback framework. We explore rich models of user interests, built from both search-related information, such as previously issued queries and previously visited Web pages

Focused crawling using context graphs

by M. Diligenti, F. M. Coetzee, S. Lawrence, C. L. Giles, M. Gori - In 26th International Conference on Very Large Databases, VLDB 2000 , 2000
"... diligmic,gori¢ Maintaining currency of search engine indices by exhaustive crawling is rapidly becoming impossible due to the increasing size and dynamic content of the web. Focused crawlers aim to search only the subset of the web related to a specific category, and offer a potential solution to th ..."
Abstract - Cited by 255 (11 self) - Add to MetaCart
we present a focused crawling algorithm that builds a model for the context within which topically relevant pages occur on the web. This context model can capture typical link hierarchies within which valuable pages occur, as well as model content on documents that frequently cooccur with relevant
Next 10 →
Results 1 - 10 of 2,195
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University