[liberationtech] CORRECTION: European privacy regulators' excellent paper on Anonymisation Techniques

Sun Apr 20 02:30:12 PDT 2014

On 17/04/14 09:09, Shava Nerad wrote:
>
> Do they have teeth to enforce that, Caspar?  The political will, do 
> you think?
>

Until/unless the new GDPR, enforcement depends on both teeth and guts of 
DPAs under 28 national laws. Any of 500m data subjects can file a 
complaint, citing appropriate chunks of the Opinion. Such a complaint 
might be about refusal of rights to access (and in new GDPR delete) 
clickstream data (pseudonymous by Cookie|IP tuple), and whether such 
data is treated as personal under the terms of privacy statements, 
especially if transferred to US under any legal ground.

Another good target is any type social network graph data, nominally 
de-identified but retaining social structure. Such data is almost 
impossible to anonymize without vast data reduction

In case anyone here interested in jumping down European rabbit hole, 
here's a few  background notes of work in progress - comments welcome

The story begins in 1995 when as the price for allowing the DP Directive 
(EC95/46) to proceed, the UK engineered shoving what was previously an 
Article into  a Recital (which MS need not transpose), defining 
anonymisation. Effectively the UK said "it our country and we will 
define pseudonymous as anonymous if we want to".  But if you ask a 
member of the public whether they can vote anonymously in UK, they 
usually change their minds when told Parliament keeps a copy of all 
voting slips and ballots, and that MI5 could join them up again if they 
wanted to (and did in the 50s). That's the principle at stake.

In the new DP Regulation, in Council the UK wants to take away any data 
breach notification to the individual for pseudonymous data, and worse 
LIBE defined pseudonymous to include "identity escrow", aka Trusted 
Third Party (with Amendments ALDE promoted and US/UK influenced). LIBE 
also were bamboozled into nullifying access and deletion rights to 
pseudonymous data in a different misguided (or lethally deceptive) Amndt.

The mere creation or retention of personal data engages rights to 
privacy and Data Protection, irrespective of how it is subsequently 
used, and this may be disproportionate as spectacularly found by CJEU 
last week. You can't have a single market with 2 Member States with 
largest Internet sector (UK, IE) arbitraging the vast loophole of 
"pseudonymous=anonymous=unregulated".

This is the climax of a 20 struggle of over the term anonymity, which is 
why the excellent WP29 Opinion 216 is so timely and welcome). The UK is 
basically trying to get any privacy promiscuous pseudonymity project off 
the ground, with an OpenData/BigData tag, in a race before the sausage 
machine of GDPR negotiation resumes. Exit from the EU (and probably CoE 
108) would be the only way to continue the pretence after the 
Regulation, or - as the UK is flailingly trying to do - provide so many 
exemptions for "pseudonymisation" that it legitimatises the UK 20 year 
out-on-a-limb position.

Comp.sci has developed a battery of techniques in last 20 years to 
distribute privacy risk and still do useful calculations. However, one 
of the main conclusions of the Opinion was that no single metric or 
prescription exists. True anonymisation remains an art which requires 
PhD comp.sci expertise applied case-by-case.

The BigData hoopla last several years is essentially a propaganda 
code-word for the idea that pseudonymous processing should be 
de-regulated as "anonymous".

In 2011 ICO held a workshop at Wellcome attended by UK stats research 
bods, and Prof.Paul Ohm (flown in by them because of his breakthrough 
paper describing both the NetFlix de-anonymisatioin and Differential 
Privacy) ripped into their bogus pseudonymity=anonymity concept as 
incompatible with Rec.26. They ignored that.

In 2012 they issued a Code of Practice. At the launch event I pointed 
out in Q&A that it contained: (my emphasis)

  * pp.7 /We draw a distinction between anonymisation techniques used to
    produce aggregated information, for example, and those -- *such as
    pseudonymisation* -- that *produce anonymised* data but on an
    individual-level basis./

  * pp.21 /the possibility of linking several anonymised datasets to the
    same individual can be a precursor to identification. This does not
    mean though, that //*effective anonymisation through
    pseudonymisation*//becomes impossible/

  * pp.42 /Using a //*trusted third party*//*to anonymise *//data/
    (section)
      o [not re: pseudonymity per se, but reversibility is anonymity
        oxymoron]

  * pp.51 /Appendix 2 -- Some key anonymisation techniques/
      o /_*Pseudonymisation*_/ (section)

So the entire CoP is based on the false premise "pseudonymous = type of 
anonymous", which is flatly contradicted by Recital.26 (the one defining 
anonymity stringently), but on the face of it compatible with UK law, 
because UK never transposed Rec.26. For the last 19 years, whenever you 
read "anonymous" in a UK policy document, the UK had two fingers crossed 
behind its back - that pseudonymous data counted as "anonymous" (and 
therefore unregulated"

There is also a sociology-of-science explanation for this confusion, 
about the difference in outlook between a statistical and comp.sci 
privacy researcher. Pseudonymisation is defined as "/formal 
anonymisation/" as a term of art in statistics scientific literature 
(and other). It isn't used in this sense in the computer science of 
privacy (indeed it's a solecism).

Statistical researchers *definition* for "anonymity" exempts 
identification by the researcher. It's a blind spot, perhaps a cultural 
assumption with origins of statistics at the heart of the state. Every 
statistical agency in the EU - including Eurostat - releases data whilst 
retaining the original data, but assesses the "anonymity" of their 
disclosures exempting their own knowledge.

It isn't therefore very useful to start with this terminology (never 
devised with privacy as the central concept), as the basis for a Code 
supposed to reflect EU DP. But for 15 years no butter has melted in 
mouths of ICO officials when this point is put to them point blank. So 
is that "well done ICO", or "what a complete waste of time"?

In contrast, WP29 in their exemplary new Opinion on Anonymisation 
Techniques condense 15 yrs of comp.sci privacy research into three criteria:

Is is still possible to:
1. single out an individual?
2. link records relating to individuals?
3. can information be inferred concerning individuals ?

Computer science is the only discipline that has rigorously studied 
privacy exposure from the viewpoint of the individual human right to 
privacy and Data Protection. These three WP criteria include the risks 
statisticians implicitly exclude, which are the risks concomitant on 
them having the data in the first place, and comp.sci has developed 
techniques like secure multi-party computation and Private Information 
Retrieval which obviate knowledge by a central party.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20140420/d333a027/attachment.html>