There seems to be a lot of research on anonymisation methods for graph structured data to protect privacy in online social networks(Zhou, et al. 2008), but where can I find similar methods for anonymising RDF and privacy in the semantic web in general?

It seems to me that privacy preservation in the Linked data model is even more complex, given that: (I'm new to this, so please correct me)

  1. RDF and OWL is more sensitive to information loss which typically occurs with clustering methods. By masking nodes and edges with sensitive attributes and modifying the graph structure, the certainty of inferencing and DL is significantly compromised.
  2. Federated SPARQL queries and the Linked Data model offer adversarial attackers even more leverage (and convenience) in utilizing their background knowledge to target invdividuals.
  3. Some features of the RDF model such as directionality (in graph terms), are not covered by current anonymisation methods.

For someone who wants to publish anonmyised personal data as Linked Data - if only to create FOAF Persons and attributes (on the behalf of my friends :), where do I turn to?

asked 28 Jan '13, 04:16

edan's gravatar image

accept rate: 0%

edited 30 Jan '13, 15:06

Signified's gravatar image

Signified ♦

Good question. My gut says that anonymisation on the infrastructural level of RDF/RDFS/OWL would be difficult, and that it would be up to domain experts to make informed decisions. My initial impression was that, as a crude analogy, it would be like asking how to anonymise JSON ... i.e., it depends on the JSON.

On the other hand, there's an interesting parallel between "entity disambiguation" or "sameAs mining" (that look to resolve identity through looking for potentially complex or approximate "keys" ... e.g., birthday and full name, or username and site, etc.) and de-anonymisation techniques. So maybe you could take that research and invert it. :)

/2 cents


answered 28 Jan '13, 15:08

Signified's gravatar image

Signified ♦
accept rate: 38%

edited 30 Jan '13, 14:57

Wouldn't this be as easy as making your RDF data not public? And if you have a SPARQL end-point you would also make that private.

Otherwise it doesn't make much sense to use Linked Data, which is all about collaboration and sharing data, if you're not willing to share.


answered 29 Jan '13, 10:01

Luca%20Matteis's gravatar image

Luca Matteis
accept rate: 11%

edited 29 Jan '13, 10:59

That's a bit of an oversimplification. There is tons of data (for example, in government organizations) that can be of great interest if published, yet may contain privacy-sensitive things (details of individuals). One goal of anonymisation is to be able to 'filter' such datasets, so that the "safe-to-publish" bit can be extracted and made public.

(29 Jan '13, 14:19) Jeen Broekstra ♦ Jeen%20Broekstra's gravatar image

I'm not sure about that. I would separate the privacy-sensitive triples, and make them private. So you could imagine a system where you have a public.rdf and a private.rdf dataset. Triples offer this kind of flexibility, so why not take advantage of it? You could protect your private.rdf dataset with HTTP standard authentication, making it extremely easy to implement.

(29 Jan '13, 14:39) Luca Matteis Luca%20Matteis's gravatar image

@lmatteis, that's fine, but how do you decide which triples to make private? :) That is not as simple as it may seem at first glance...

(29 Jan '13, 15:53) Jeen Broekstra ♦ Jeen%20Broekstra's gravatar image

@Jeen What do mean by saying it's be not so simple?

(29 Jan '13, 16:44) Tomasz Plusk... Tomasz%20Pluskiewicz's gravatar image

I don't claim to be an expert, but what I mean is that it is often more involved than just hiding the phone-number and lastname properties - deciding what to show and what to hide is a bit of a balancing act. Is leaving birthdays in ok if we leave out names? Can links to external documents be left in if those documents potentially mention individuals? Apart from the obvious fields (name, birthday, email), which combinations of properties make individuals potentially identifiable? You can of course just filter everything out, but that way you will likely end up with an uninteresting dataset.

(29 Jan '13, 17:05) Jeen Broekstra ♦ Jeen%20Broekstra's gravatar image

If by separation of triples, you mean removing certain nodes or edges, this would distort the graph and therefore break the chain of inferences required to answer queries. Grouping triples (aggregating Classes and/or properties) is a viable alternative, but will introduce some level of uncertainty in results, eg: "Bob" (:knows OR :worksFor) ("Jim" or "Sally"). Apparently graph anonymization is much more complex than tabular data anonymization, where you can remove certain values, columns or rows to acheive k-anonymity. Zhou et al. (2008) explain why in their paper (2.1)

(29 Jan '13, 20:35) edan edan's gravatar image

Maybe I am oversimplifying this, but if you want to anonymize your data, you most likely need to know what is the data that you want to anonymize. This is true not only in a triple system, but on any other medium where you have data (even on a piece of paper). So of course you need to identify what is considered to be privacy-sensitive in your system. All I'm trying to say is that it doesn't seem to be different from anonymizing any other type of data. You simply identify if, and you anonymize it using the same exact methods you would use with any other type of data.

(30 Jan '13, 03:48) Luca Matteis Luca%20Matteis's gravatar image
showing 5 of 7 show 2 more comments
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 28 Jan '13, 04:16

Seen: 659 times

Last updated: 30 Jan '13, 15:06