Results 1 -
9 of
9
Graph mining: Laws, generators, and algorithms
- ACM COMPUTING SURVEYS
, 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation i ..."
Abstract
-
Cited by 49 (7 self)
- Add to MetaCart
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: "How can we generate synthetic but realistic graphs?" To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
Collection Synthesis
, 2002
"... The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet -- the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resourc ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet -- the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resources, and imposing organization on the huge collection of information (and dross) residing on the net. This paper reports on the use of one such crawler to synthesize document collections on various topics in science, mathematics, engineering and technology. Such collections could be part of a digital library.
Focused Crawls, Tunneling, and Digital Libraries
- In Proceedings of the European Conference on Digital Libraries (ECDL
, 2002
"... Crawling the Web to build collections of documents related to pre-specified topics became an active area of research during the late 1990's after crawler technology was developed for the benefit of search engines. Now, Web crawling is being seriously considered as an important strategy for build ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Crawling the Web to build collections of documents related to pre-specified topics became an active area of research during the late 1990's after crawler technology was developed for the benefit of search engines. Now, Web crawling is being seriously considered as an important strategy for building large scale digital libraries. This paper considers some of the crawl technologies that might be exploited for collection building. For example, to make such collection-building crawls more effective, focused crawling was developed, in which the goal was to make a "best-first" crawl of the Web. We are using powerful crawler software to implement a focused crawl but use tunneling to overcome some of the limitations of a pure best-first approach. Tunneling has been described by others as not only prioritizing links from pages according to the page's relevance score, but also estimating the value of each link and prioritizing on that as well. We add to this mix by devising a tunneling focused crawling strategy which evaluates the current crawl direction on the fly to determine when to terminate a tunneling activity. Results indicate that a combination of focused crawling and tunneling could be an e#ective tool for building digital libraries.
Recognizing ontology-applicable multiple-record Web documents
- In Proceedings of the 20th International Conference on Conceptual Modeling (ER2001
, 2001
"... Automatically recognizing which Web documents are “of interest ” for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiple-record Web documents apply to an ontologically specified application. Given the values and kind ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
Automatically recognizing which Web documents are “of interest ” for some specified application is non-trivial. As a step toward solving this problem, we propose a technique for recognizing which multiple-record Web documents apply to an ontologically specified application. Given the values and kinds of values recognized by an ontological specification in an unstructured Web document, we apply three heuristics: (1) a density heuristic that measures the percent of the document that appears to apply to an application ontology, (2) an expected-value heuristic that compares the number and kind of values found in a document to the number and kind expected by the application ontology, and (3) a grouping heuristic that considers whether the values of the document appear to be grouped as application-ontology records. Then, based on machine-learned rules over these heuristic measurements, we determine whether a Web document is applicable for a given ontology. Our experimental results show that we have been able to achieve over 90 % for both recall and precision, with an Fmeasure of about 95%. 1
The Shape of the Web and Its Implications for Searching the Web
, 2000
"... With the rapid growth of the number of web pages, designing a search engine that can retrieve high quality information in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches and they take a long time to ru ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
With the rapid growth of the number of web pages, designing a search engine that can retrieve high quality information in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches and they take a long time to run. It is argued in the literature that link-following search methods can substantially increase the search quality, provided that these methods use an accurate assumption about useful patterns in the hyperlink topology of the web. Recent work in the field has focused on detecting identi able patterns in the web graph and exploiting this information to improve the performance of search algorithms. We survey relevant work in this area and comment on the implications of these patterns for other areas such as advertisement and marketing.
Hyperlink Analysis: Techniques and Applications
, 2002
"... ABSTRACT.................................................................................................................................................. 0 ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
ABSTRACT.................................................................................................................................................. 0
Content and Link Structure Analysis for Searching the Web
- In Computational Web Intelligence: Intelligent Technology for Web Applications
, 2004
"... Finding relevant pages in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches. Link analysis methods can substantially improve the search quality when they are combined with content analysis. This chapter ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Finding relevant pages in response to a user query is a challenging task. Automated search engines that rely on keyword matching usually return too many low quality matches. Link analysis methods can substantially improve the search quality when they are combined with content analysis. This chapter surveys the mainstream work in this area. 1.
MirrorSEEk System Architecture
, 2001
"... Separation of user system interaction from computation is a key issue in the ambient intelligence vision: Electronic devices cannot disappear into our surroundings and provide the required transparency, ubiquity and intelligence for ambient intelligence, unless we can separate the user system intera ..."
Abstract
- Add to MetaCart
Separation of user system interaction from computation is a key issue in the ambient intelligence vision: Electronic devices cannot disappear into our surroundings and provide the required transparency, ubiquity and intelligence for ambient intelligence, unless we can separate the user system interaction from the computing processes. To make this separation possible, a network for communication and a format to describe information in a structured way is needed. Internet and XML technology provide just that.

