Results 1 - 10
of
15
Extracting Large-Scale Knowledge Bases From the Web
- Proceedings of the 25th VLDB Conference
, 1999
"... The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the ..."
Abstract
-
Cited by 97 (2 self)
- Add to MetaCart
The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities. 1 Overview The subject of this paper is the creation of knowledge bases by ...
Creating Customized Authority Lists
- In Proceedings of the 17th International Conference on Machine Learning
, 1999
"... The proliferation of hypertext and the popularity of Kleinberg 's HITS algorithm have brought about an increased interest in link analysis. While HITS and its older relatives from the Bibliometrics literature provide a method for finding authoritative sources on a particular topic, they do not allow ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
The proliferation of hypertext and the popularity of Kleinberg 's HITS algorithm have brought about an increased interest in link analysis. While HITS and its older relatives from the Bibliometrics literature provide a method for finding authoritative sources on a particular topic, they do not allow individual users to inject their own opinions about what sources are authoritative. This paper presents a technique that incorporates user feedback by adjusting the measure of authority to match an individual's internal notion of what sources are important. By "lifting" the authority of a few user-specified sources, the eigenvectors of the entire link matrix are realigned, resulting in a computationally cheap method that is much more rich than simple "spreading activation." We present experimental results based on a database of about one million references collected as part of the Cora on-line index of the computer science literature. KEYWORDS: Bibliometrics, feedback learning, authorities...
Fitting the Jigsaw of Citation: Information Visualization in Domain Analysis
- Journal of the American Society for Information Science and Technology
, 2001
"... Introduction When we first encounter a scientific discipline, or a subject domain, we often would need to have a good standing point and as many signposts as possible to guide ourselves through the field. On the other hand, more experienced researchers and domain experts would need effective ways to ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Introduction When we first encounter a scientific discipline, or a subject domain, we often would need to have a good standing point and as many signposts as possible to guide ourselves through the field. On the other hand, more experienced researchers and domain experts would need effective ways to track the development of their own fields and extract crucial signs of the dynamics of a scientific discipline (Bush, 1945). The World Wide Web (Web) has revolutionized the way we search for information. On today's Web we can easily access a vast amount of information on almost any subject. However, a profound challenge to many of us in the modern information society is to transcend the vast amount of information in scientific literature and access scientific knowledge at a higher level. The meta-knowledge, the knowledge of how particular knowledge structures have been perceived, should become an integral part of the scientific discipline involved, and it should be presented with simplicity and clarity for scholarly communication as well as public understanding. Domain visualization is an exciting field of study that addresses these q
Learning to Create Customized Authority Lists
- in Proceedings of the 7th International Conference on Machine Learning, ICML 2000
, 2000
"... The proliferation of hypertext and the popularity of Kleinberg's HITS algorithm have brought about an increased interest in link analysis. While HITS and its older relatives from the Bibliometrics provide a method for finding authoritative sources on a particular topic, they do not allow indiv ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
The proliferation of hypertext and the popularity of Kleinberg's HITS algorithm have brought about an increased interest in link analysis. While HITS and its older relatives from the Bibliometrics provide a method for finding authoritative sources on a particular topic, they do not allow individual users to inject their own opinions on what sources are authoritative. This paper presents a technique for learning a user's internal model of authority. We present experimental results based on Cora on-line index, a database of approximately one million on-line computer science literature references. 1. Introduction Bibliometrics (White & McCain, 1989; Small, 1973) involves studying the structure that emerges from sets of linked documents. Traditionally, these links have taken the form of citations among journal articles, although Kleinberg (1997) and others (e.g., Brin & Page, 1998) have found that they adapt well to sets of hyperlinked documents. Bibliometric techniques exis...
Support vector machines for dyadic data
- Neural Computation
"... We describe a new technique for the analysis of dyadic data, where two sets of objects (“row ” and “column ” objects) are characterized by a matrix of numerical values which describe their mutual relationships. The new technique, called “Potential Support Vector Machine ” (P-SVM), is a large-margin ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We describe a new technique for the analysis of dyadic data, where two sets of objects (“row ” and “column ” objects) are characterized by a matrix of numerical values which describe their mutual relationships. The new technique, called “Potential Support Vector Machine ” (P-SVM), is a large-margin method for the construction of classifiers and regression functions for the “column ” objects. Contrary to standard support vector machine approaches, the P-SVM minimizes a scale-invariant capacity measure and requires a new set of constraints. As a result, the P-SVM method leads to a usually sparse expansion of the classification and regression functions in terms of the “row ” rather than the “column ” objects and can handle data and kernel matrices which are neither positive definite nor square. We then describe two complementary regularization schemes. The first scheme improves generalization performance for classification and regression tasks, the second scheme leads to the selection of a small, informative set of “row ” “support ” objects and can be applied to feature selection. Benchmarks for classification, regression, and feature selection tasks are performed with toy data as well as with several real world data sets. The results show, that the new method is at least competitive with but often performs better than the benchmarked standard methods for standard vectorial as well as for true dyadic data sets. In addition, a theoretical justification is provided for the new approach. 1
An Agenda for Digital Journals: The Socio-Technical Infrastructure of Knowledge Dissemination
, 1993
"... The problems of information overload from the growth of scholarly literature, and the need to use information technology to manage them, were identified by major writers and scientists over fifty years ago. Yet the main form of scholarly communication, the journal, is still circulated in paper form ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
The problems of information overload from the growth of scholarly literature, and the need to use information technology to manage them, were identified by major writers and scientists over fifty years ago. Yet the main form of scholarly communication, the journal, is still circulated in paper form as it has been for over three hundred years. The economic arguments for using computer and communication technology to overcome these problems through a new form of scientific communication, the electronic or digital journal, were vigorously presented in the 1970s. Experimental trials of digital journals with the technologies of the 1970s and 1980s have not been successful. In the 1990s, the continuing value of current journal systems is again being questioned in terms of soaring library costs, the burden of the current refereeing system and the diminishing returns of journal publication brought about by information overload. This paper presents a fundamental examination of the prerequisites...
The social semantics of livejournal foaf: Structure and change from 2004 to 2005
- In Proceedings of the 1st Workshop on Semantic Network Analysis at the ISWC 2005 Conference, pages 69 – 80
, 2005
"... Abstract. Social Network Analysis methods hold substantial promise for the analysis of Semantic Web metadata, but considerable work remains to be done to reconcile the methods and findings of Social Network Analysis with the data and inference methods of the Semantic Web. The present study develops ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Social Network Analysis methods hold substantial promise for the analysis of Semantic Web metadata, but considerable work remains to be done to reconcile the methods and findings of Social Network Analysis with the data and inference methods of the Semantic Web. The present study develops a Social Network Analysis for the foaf:knows and foaf:interests relations of a sample of LiveJournal user profiles. The analysis demonstrates that although there are significant and generally stable structural regularities among both types of metadata, they are largely uncorrelated with each other. Also there are large local variations in the clusters obtained that mitigate their reliability for inference. Hence, while information useful for semantic inference over user profiles can be obtained in this way, the distributional nature of user profile data needs closer study. 1
Method for the Subjective Assessment of the Quality of Television Pictures
, 2003
"... Researchers in any academic discipline build on each other's and their own previous work. Definitions, topics and concepts are shared. It is necessary to continuously follow up on interesting lines of inquiry. It is also necessary to identify, examine, and trace the intellectual linkage to each othe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Researchers in any academic discipline build on each other's and their own previous work. Definitions, topics and concepts are shared. It is necessary to continuously follow up on interesting lines of inquiry. It is also necessary to identify, examine, and trace the intellectual linkage to each other in a given academic field as a basis of assessing the current state of its field to guide future development. Over the past 80 years, the way we count and analyze the intellectual linkage dramatically changed from the early manual transcribing and statistical computation of citation data to computer-based citation data creation and its manipulation. Most citation and cocitation analyses rely on commercial citation databases such as Social Science Citation Index. This paper introduces an alternative approach to conducting author cocitation analysis (ACA) without relying on commercial citation databases, based on custom bibliographic database and cocitation matrix generation systems specifically developed to use the custom database. The alternative approach overcomes several weaknesses of commercial online data-based ACA research. This guide to an alternative approach to ACA will encourage other researchers to explore the intellectual structures of various MIS fields and guide the future development as well as revealing their reference disciplines.
Modeling the invisible college
- Journal of the American Society for Information Science and Technology
, 2006
"... This paper addresses the invisible college concept with the intent of developing a consensus regarding its definition. Emphasis is placed on the term as it was defined and used in Derek de Solla Price’s (1963; 1986) work and reviewed on the basis of its thematic progress in past research over the ye ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper addresses the invisible college concept with the intent of developing a consensus regarding its definition. Emphasis is placed on the term as it was defined and used in Derek de Solla Price’s (1963; 1986) work and reviewed on the basis of its thematic progress in past research over the years. Special attention is given to Lievrouw’s (1990) article concerning the structure versus social process problem to show that both conditions are essential to the invisible college and both may be reconciled. A new definition of the invisible college is also introduced, including a proposed research model. With this model, researchers are encouraged to study the invisible college by focusing on three critical components – the subject specialty, the scientists as social actors, and the Information Use Environment (IUE).
Inter-document similarity in web searches
, 2004
"... are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri: ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri:

