Results 1 - 10
of
22
The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance
"... As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intellige ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. In this paper, we show how to provide strong integrity and confidentiality assurances for data provenance information. We describe our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes it extremely easy to deploy. We present empirical results that show that, for typical real-life workloads, the runtime overhead of our approach to recording provenance with confidentiality and integrity guarantees ranges from 1 % – 13%. 1
Web Spam, Propaganda and Trust
, 2005
"... Web spamming, the practice of introducing artificial text and links into web pages to a#ect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Web spamming, the practice of introducing artificial text and links into web pages to a#ect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we first analyze the influence that web spam has on the evolution of the search engines and we identify the strong relationship of spamming methods to propagandistic techniques in society. Our analysis provides a foundation to understanding why spamming works and o#ers new insight on how to address it. In particular, it suggest that one could use anti-propagandistic techniques in the web to recognize spam. The second part of the paper demonstrates such a technique, called backwards propagation of distrust. In society, recognition of an untrustworthy message (in the opinion of a particular person or other social entity) is a reason for questioning the entities that recommend the message. Entities that are found to strongly support untrustworthy messages become untrustworthy themselves. So, social distrust is propagated backwards for a number of steps. Our algorithm simulates this social behavior on the web graph. In our algorithm, starting from an untrustworthy (according to the end user) site s, we examine its trust neighborhood, that is, the neighborhood of sites that link to s in a few steps. Evaluating the sites-members of the neighborhood we identify a biconnected component (BCCs) with a high percentage of untrustworthy sites. BCCs are formed when there are multiple paths to reach s, thus indicating a concerted e#ort to promote s. This is not the case when starting from a trustworthy site. Our tool explores thousands o...
A Survey of Data Provenance Techniques
, 2005
"... Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate th ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field. 1
Efficient, automatic web resource harvesting
- In RECOMB
, 2006
"... There are two problems associated with conventional web crawling techniques: a crawler cannot know if all resources at a non-trivial web site have been discovered and crawled (“the counting problem”) and the human-readable format of the resources are not always suitable for machine processing (“the ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
There are two problems associated with conventional web crawling techniques: a crawler cannot know if all resources at a non-trivial web site have been discovered and crawled (“the counting problem”) and the human-readable format of the resources are not always suitable for machine processing (“the representation problem”). We introduce an approach that solves these two problems by implementing support for both the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and MPEG-21 Digital Item Declaration Language (DIDL) into the web server itself. We present the Apache module “mod oai”, which can be used to address the counting problem by listing all valid URIs at a web server and efficiently discovering updates and additions on subsequent crawls. Our experiments indicated comparable performance for initial crawls, and dramatic increases in update speed. mod oai can also be used to address the representation problem by providing “preservation ready” versions of web resources aggregated with their respective forensic metadata in MPEG-21 DIDL format. Categories and Subject Descriptors:H.3.5 Information
Trust and Epistemic Communities in Biodiversity Data Sharing
- In JCDL ’02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries
, 2002
"... Trust is a key element of knowledge work: what we know depends largely on others. This paper discusses the concepts of communities of practice and epistemic cultures, and their implication for design of digital libraries that support data sharing, with particular reference to practices of trust and ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Trust is a key element of knowledge work: what we know depends largely on others. This paper discusses the concepts of communities of practice and epistemic cultures, and their implication for design of digital libraries that support data sharing, with particular reference to practices of trust and credibility. It uses an empirical study of a biodiversity digital library of data from a variety of sources to illustrate implications digital library design and operation. It concludes that diversity and uncomfortable boundary areas typify, not only digital library user groups, but the design and operation of digital libraries.
Preventing History Forgery with Secure Provenance
"... As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intellige ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
As increasing amounts of valuable information are produced and persist digitally, the ability to determine the origin of data becomes important. In science, medicine, commerce, and government, data provenance tracking is essential for rights protection, regulatory compliance, management of intelligence and medical data, and authentication of information as it flows through workplace tasks. While significant research has been conducted in this area, the associated security and privacy issues have not been explored, leaving provenance information vulnerable to illicit alteration as it passes through untrusted environments. In this paper, we show how to provide strong integrity and confidentiality assurances for data provenance information at the kernel, file system, or application layer. We describe Sprov, our provenance-aware system prototype that implements provenance tracking of data writes at the application layer, which makes Sprov extremely easy to deploy. We present empirical results that show that, for real-life workloads, the runtime overhead of Sprov for recording provenance with confidentiality and integrity guarantees ranges from 1 % – 13%, when all file modifications are recorded, and from 12 % – 16%, when all file read and modifications are tracked.
Repository Synchronization in the OAI Framework
- In Proceedings of JCDL ’03
, 2003
"... The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) began as an alternative to distributed searching of scholarly eprint repositories. The model embraced by the OAI-PMH is that of metadata harvesting, where value-added services (by a "service provider") are constructed on cached ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) began as an alternative to distributed searching of scholarly eprint repositories. The model embraced by the OAI-PMH is that of metadata harvesting, where value-added services (by a "service provider") are constructed on cached copies of the metadata extracted from the repositories of the harvester's choosing. While this model dispenses with the well known problems of distributed searching, it introduces the problem of synchronization. Stated simply, this problem arises when the service provider's copy of the metadata does not match the metadata currently at the constituent repositories. We define some metrics for describing the synchronization problem in the OAI-PMH. Based on these metrics, we study the synchronization problem of the OAI-PMH framework and propose several approaches for harvesters to implement better synchronization. In particular, if a repository knows its update frequency, it can publish it in an OAI-PMH Identify response using an optional About container that borrows from RDF Site Syndication (RSS) Format.
Using Propagation Of Distrust to find Untrustworthy Web Neighborhoods
"... Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. But it is mainly a serious problem for web users because they tend to confuse trusting the search engine with trusting the ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. But it is mainly a serious problem for web users because they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we propose “backwards propagation of distrust, ” as an approach to finding spamming untrustworthy sites. Our approach is inspired by the social behavior associated with distrust. In society, recognition of an untrustworthy entity (person, institution, idea, etc) is a reason for questioning the trustworthiness of those that recommended tis entity. People that are found to strongly support untrustworthy entities become untrustworthy themselves. So, in society distrust is propagated backwards. Our algorithm simulates this social behavior on the web graph with considerable success. Moreover, by respecting the user’s perception of trust through the web graph, our algorithm makes it possible to resolve the moral question of who should be making the decision of weeding out untrustworthy spammers in favor of the user, not the search engine or some higher authority. Our approach can lead to browser-level, or personalized server-side, web filters that work in synergy with the powerful search engines to deliver personalized, trusted web results. 1.
On the evolution of search engine rankings
- In In the Proceedings of the 2009 WEBIST Conference
"... Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990’s, however, it was ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990’s, however, it was apparent that the human expert model of categorizing web pages does not scale. The first search engines appeared and they have been evolving ever since, taking over the role that web directories used to play. But what need makes a search engine evolve? Beyond the financial objectives, there is a need for quality in search results. Users interact with search engines through search query results. Search engines know that the quality of their ranking will determine how successful they are. If users perceive the results as valuable and reliable, they will use it again. Otherwise, it is easy for them to switch to another search engine. Search results, however, are not simply based on well-designed scientific principles, but they are influenced by web spammers. Web spamming, the practice of introducing artificial text and links into web pages to affect the results of web searches, has been recognized as a major search engine problem. It is also a serious users problem because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we analyze the influence that web spam has on the evolution of the search engines and we identify the strong relationship of spamming methods on the web to propagandistic techniques in society. Our analysis provides a foundation for understanding why spamming works and offers new insight on how to address it. In particular, it suggests that one could use social anti-propagandistic techniques to recognize web spam. 1
Contents
, 2002
"... of Michigan University Library and was responsible for developing one of the leading comprehensive digital library programs. 8 ISBN 1-887334-93-9 ..."
Abstract
- Add to MetaCart
of Michigan University Library and was responsible for developing one of the leading comprehensive digital library programs. 8 ISBN 1-887334-93-9

