• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 890,161
Next 10 →

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

by Peter Turney , 2002
"... This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A ..."
Abstract - Cited by 741 (5 self) - Add to MetaCart
This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs

Wrapper Induction for Information Extraction

by Nicholas Kushmerick , 1997
"... The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, weather forecasts, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually ..."
Abstract - Cited by 612 (30 self) - Add to MetaCart
are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Wrappers are often used for this purpose. A wrapper is a procedure for extracting a particular resource's content. Unfortunately, hand-coding wrappers is tedious. We

Unsupervised Named-Entity Extraction from the Web: An Experimental Study

by Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates - ARTIFICIAL INTELLIGENCE , 2005
"... The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW-ITALL’s novel architecture and design princip ..."
Abstract - Cited by 364 (39 self) - Add to MetaCart
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW-ITALL’s novel architecture and design

Extracting patterns and relations from the world wide web

by Sergey Brin - In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT’98 , 1998
"... Abstract. The World Wide Web is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many di erent formats. In this paper, we consider the problem of e ..."
Abstract - Cited by 462 (1 self) - Add to MetaCart
of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author,title) pairs

DBpedia: A Nucleus for a Web of Open Data

by Christian Bizer, Georgi Kobilarov, Jens Lehmann, Zachary Ives - Proc. 6th Int’l Semantic Web Conf , 2007
"... Abstract DBpedia is a community effort to extract structured informa-tion from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the ..."
Abstract - Cited by 619 (36 self) - Add to MetaCart
the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current

Topic-Sensitive PageRank

by Taher Haveliwala , 2002
"... In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search resu ..."
Abstract - Cited by 535 (10 self) - Add to MetaCart
In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search

Rank Aggregation Methods for the Web

by Cynthia Dwork, Ravi Kumar, Moni Naor, D. Sivakumar , 2010
"... We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. Wed ..."
Abstract - Cited by 473 (6 self) - Add to MetaCart
We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations

Extracting Relations from Large Plain-Text Collections

by Eugene Agichtein, Luis Gravano , 2000
"... Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables fr ..."
Abstract - Cited by 480 (25 self) - Add to MetaCart
from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection. We build on this idea and present our Snowball system. Snowball

Generating Representative Web Workloads for Network and Server Performance Evaluation

by Paul Barford, Mark Crovella , 1997
"... One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web worklo ..."
Abstract - Cited by 933 (11 self) - Add to MetaCart
references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties

A taxonomy of web search

by Andrei Broder - SIGIR FORUM , 2002
"... Classic IR (information retrieval) is inherently predicated on users searching for information, the socalled "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me s ..."
Abstract - Cited by 639 (4 self) - Add to MetaCart
Classic IR (information retrieval) is inherently predicated on users searching for information, the socalled "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me
Next 10 →
Results 1 - 10 of 890,161
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University