Results 1 - 10
of
24
Inferring private information using social network data
, 2008
"... Online social networks, such as Facebook, are increasingly utilized by many users. These networks allow people to publish details about themselves and connect to their friends. Some of the information revealed inside these networks is private and it is possible that corporations could use learning a ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Online social networks, such as Facebook, are increasingly utilized by many users. These networks allow people to publish details about themselves and connect to their friends. Some of the information revealed inside these networks is private and it is possible that corporations could use learning algorithms on the released data to predict undisclosed private information. In this paper, we propose an effective, scalable inference attack for released social networking data to infer undisclosed private information about individuals. We then explore the effectiveness of possible sanitization techniques that can be used to combat such an inference attack. 1
Preventing Private Information Inference Attacks on Social Networks
, 2009
"... On-line social networks, such as Facebook, are increasingly utilized by many people. These networks allow users to publish details about themselves and connect to their friends. Some of the information revealed inside these networks is meant to be private. Yet it is possible that corporations could ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
On-line social networks, such as Facebook, are increasingly utilized by many people. These networks allow users to publish details about themselves and connect to their friends. Some of the information revealed inside these networks is meant to be private. Yet it is possible that corporations could use learning algorithms on released data to predict undisclosed private information. In this paper, we explore how to launch inference attacks using released social networking data to predict undisclosed private information about individuals. We then devise three possible sanitization techniques that could be used in various situations. Then, we explore the effectiveness of these techniques by implementing them on a dataset obtained from the Dallas/Fort Worth, Texas network of the Facebook social networking application and attempting to use methods of collective inference to discover sensitive attributes of the data set. We show that we can decrease the effectiveness of both local and relational classification algorithms by using the sanitization methods we described. Further, we discover a problem domain where collective inference degrades the performance of classification algorithms for determining private attributes. 1
Is it a Bug or an Enhancement? A Text-based Approach to Classify Change Requests
"... Bug tracking systems are valuable assets for managing maintenance activities. They are widely used in open-source projects as well as in the software industry. They collect many different kinds of issues: requests for defect fixing, enhancements, refactoring/restructuring activities and organization ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Bug tracking systems are valuable assets for managing maintenance activities. They are widely used in open-source projects as well as in the software industry. They collect many different kinds of issues: requests for defect fixing, enhancements, refactoring/restructuring activities and organizational issues. These different kinds of issues are simply labeled as “bug” for lack of a better classification support or of knowledge about the possible kinds. This paper investigates whether the text of the issues posted in bug tracking systems is enough to classify them into corrective maintenance and other kinds of activities. We show that alternating decision trees, naive Bayes classifiers, and logistic regression can be used to accurately distinguish bugs from other kinds of issues. Results from empirical studies performed on issues for Mozilla, Eclipse, and JBoss indicate that issues can be classified with between 77 % and 82 % of correct decisions.
Combating information overload in non-visual web access using context
- In IUI
, 2007
"... Web sites are designed for graphical mode of interaction. Sighted users can visually segment Web pages and quickly identify relevant information. In contrast, visually-disabled individuals have to use screen-readers to browse the Web. Screen-readers process pages sequentially and read through everyt ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Web sites are designed for graphical mode of interaction. Sighted users can visually segment Web pages and quickly identify relevant information. In contrast, visually-disabled individuals have to use screen-readers to browse the Web. Screen-readers process pages sequentially and read through everything, making Web browsing time-consuming and strenuous. The use of shortcut keys and searching offers some improvements, but the problem still remains. In this paper, we address this problem using the notion of context. When a user follows a link, we capture the context of the link, and use it to identify relevant information on the next page. The content of this page is rearranged, so that the relevant information is read out first. We conducted a series of experiments to compare the performance of our prototype system with the state-of-the-art screen-reader, JAWS. Our results show that the use of context can potentially save browsing time as well as improve browsing experience of blind people. ACM Classification: H5.2[Information Interfaces and Presentation]:User
Who’s Who in the World Wide Web: Approaches to Name Disambiguation
, 2007
"... Personal names are fundamental in our civilization. Names serve to refer to individuals, but in contrast to e.g. social assurance numbers that are unique for each citizen, names are not treated that strictly. Names do not identify persons in a non-ambiguous way. There are people that share the same ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Personal names are fundamental in our civilization. Names serve to refer to individuals, but in contrast to e.g. social assurance numbers that are unique for each citizen, names are not treated that strictly. Names do not identify persons in a non-ambiguous way. There are people that share the same name. Furthermore, the fact that names are not treated as carefully as numbers often leads to misspellings and confusion.
The increasing importance of the Web in our lives has as consequence that people are more frequently confronted with names of things, places and persons. On the one hand, the Web provides access to more information sources and by this to more information, on the other hand the search for relevant information is getting more difficult. A particular problem occurs in (digital) libraries. They are
expected to catalog publications in a convenient way that facilitates and supports literature research. It is necessary to distinguish authors that are referred to by their names. The problem of homonymous authors arises, i.e. there may be several authors sharing a name. Inversely, authors may publish under different names or name variations, deliberately or unintendedly. Digital libraries spend much effort on the disambiguation of author names.
This work reports the results of a literature research focusing on disambiguation of homonymous authors and proposes a different perception of the data that is processed during name disambiguation. Different pieces of information are integrated into a graph based approach. An implementation of some ideas of the presented approach serves as a proof of concept and gives further insights in the nature of name disambiguation.
LocalSavvy: Aggregating local points of view about news issues
- In the WWW’08 Workshop on Location and the Web
, 2008
"... The web has become an important medium for news delivery and consumption. Fresh content about a variety of topics, events, and places is constantly being created and published on the web by news agencies around the world. As intuitively understood by readers, and studied in journalism, news articles ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The web has become an important medium for news delivery and consumption. Fresh content about a variety of topics, events, and places is constantly being created and published on the web by news agencies around the world. As intuitively understood by readers, and studied in journalism, news articles produced by different social groups present different attitudes towards and interpretations of the same news issues. In this paper, we propose a new paradigm for aggregating news articles according to the local news sources associated with the stakeholders of the news issues. This new paradigm provides users the capability to aggregate and browse various local points of view about the news issues in which they are interested. We implement this paradigm in a system called LocalSavvy. LocalSavvy analyzes the news articles provided by users, using knowledge about locations automatically acquired from the web. Based on the analysis of the news issue, the system finds and aggregates local news articles published by official and unofficial news sources associated with the stakeholders. Moreover, opinions from those local social groups are extracted from the retrieved results, presented in the summaries and highlighted in the news web pages. We evaluate LocalSavvy with a user study. The quantitative and qualitative analysis shows that news articles aggregated by LocalSavvy present relevant and distinct local opinions, which can be clearly perceived by the subjects.
Methods in Biomedical Text Mining
, 2008
"... Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method ..."
Abstract
- Add to MetaCart
Methods to improve text mining of molecular biology interactions are needed to capture a richer information space and qualify the quality of extraction. Simple interaction models fail to describe contextual and confidence information that would help with more fine-grained analyses. Herein a method is presented to streamline curation of text-mined data and a way to improve text mining of biomedical terms that can be adapted to other domains using different machine learning techniques. These advances can be integrated into more powerful text-mining systems to meet user demand and to further promote the adoption of text-mining tools. Additionally, three studies on the nature of biomedical publications are presented: their novelty hinges on the fact that each asks questions that had not been posed before. They cover the phenomena of retraction, ways to improve the impact of research, and the writing style used in biomedical literature. Retraction is a hot topic in recent times

