[Bigbang-dev] R: Data sharing allowance

Tue Oct 5 15:21:19 CEST 2021

Hi there!

My two cents on this very interesting and important conversation, which I'm following closely. It is my understanding that GDPR recognises the 'right to be forgotten', so in my understanding that would apply to the mailing list in some way as well as the Datatracker.
However, the right to be forgotten is not absolute and is limited by issues of public interest. Should standard-making be recognised as public policy-making (and there would be good reasons to do so, though I'm not sure to what extent a judge would agree), it could be possible that this right were not to apply to the mailing lists.

Hope this helps, at least a little bit...
Best,

Riccardo
________________________________
Da: Bigbang-dev <bigbang-dev-bounces at data-activism.net> per conto di Sebastian Benthall <sbenthall at gmail.com>
Inviato: martedì 5 ottobre 2021 15:04
A: Colin Perkins <csp at csperkins.org>
Cc: bigbang-dev at data-activism.net <bigbang-dev at data-activism.net>; thomas.streinz at law.nyu.edu <thomas.streinz at law.nyu.edu>
Oggetto: Re: [Bigbang-dev] Data sharing allowance

That statement makes a good point that personal information in the DataTracker can be removed or modified at the data subject's request.

It would be interesting to know if people can make similar interventions to mailing list archives.

In any case that suggests that if we make any derivative data products available, we regularly update them from the sources (DataTracker) to bring in any recent changes, even for "historical" data.

On Sat, Oct 2, 2021 at 12:54 PM Colin Perkins <csp at csperkins.org<mailto:csp at csperkins.org>> wrote:
There’s also https://datatracker.ietf.org/help/personal-information and the IETF requests participants to consent to the use of their personal data as part of the meeting registration, etc.

If there are questions about the way IETF handles personal data, then the IETF Executive Director, Jay Daley <exec-director at ietf.org<mailto:exec-director at ietf.org>>, should be able to help.

Colin

On 2 Oct 2021, at 02:00, Sebastian Benthall <sbenthall at gmail.com<mailto:sbenthall at gmail.com>> wrote:

So there's a line about consent ...

"
Your consent to disclosure

By providing us with your Personal Data, you are consenting to our disclosure and use of it for the purposes as described in this Statement"

But there are no purposes explicit in the document except the "commitment to transparency", which includes being "public... by electronic means", which the IETF would understand to include data processing because literally what Internet protocols do is process electronic publications?

Or is "process" a more limited term here that somehow does not include everything done in the operations of, say, making email archives available online through multiple indexed user interfaces, but does for some reason include plotting word usage over time (for example).

I think an interpretation of GDPR that disallows what we're doing with BigBang is clearly overbroad and will get pushback from much, much bigger fish in the ocean.

On Fri, Oct 1, 2021, 2:24 PM Stephen McQuistin <sm at smcquistin.uk<mailto:sm at smcquistin.uk>> wrote:
It's worth noting that some of the organisations hosting the mailing lists have explicit policies around participant's contributions. The IETF, for example, has this: https://www.ietf.org/privacy-statement/.

Stephen

On 1 Oct 2021, at 19:44, Thomas Streinz <tfs253 at nyu.edu<mailto:tfs253 at nyu.edu>> wrote:

Thanks, Seb. I should have been clearer: the "making manifestly public" prong only helps with Article 9 - *but not with other provisions*. In terms of lawfulness of processing (Article 6), for example, there is a question whether one could rely on Article 6(1)(f) - legitimate interests by claiming that there is (global?) public interest in this (personal) data (contained in the emails) being publicly available or at least available to researchers. The problem with this prong is that it's ultimately a balancing exercise and there is a risk that a Court would say that the data protection rights of the data subjects outweigh the public interest in access to the emails they sent (this is one of many reasons why commercial actors so often rely on Article 6(1)(a) - consent). So, unfortunately, BigBang can't rest easy.

I'm also not quite sure (as in: genuinely uncertain) whether it's right to say that the authors of emails assumed that their input would be publicly available to (potentially) billions or mined by researchers in the way BigBang does? Doesn't it make a difference (normatively) that the community of Internet researchers was initially relatively small and close-knit and access to the public mailing lists only sought by insiders?

On Fri, Oct 1, 2021 at 1:19 PM Sebastian Benthall <sbenthall at gmail.com<mailto:sbenthall at gmail.com>> wrote:
Thanks so much, Thomas. Let me join the others in welcoming your input on this.

My two cents are that we are totally fine with respect to the GDPR, because:

> For example, it's not clear whether (for purposes of escaping the additional requirements for sensitive data under Article 9) the data subjects in question made the personal data contained in their email "manifestly" public (that is: with the intention of further processing) - did the participants foresee the eventual creation of BigBang?

The answer to this question is "Yes". Not specifically BigBang, of course, but these are the people designing Internet protocols, who are the least naive people on the planet about what it means to put data in clear text on the Internet. Since "further processing" of this data includes being indexed by search engines, which has been going on long before BigBang, and has no doubt been used by the participants as they engage these materials, the data absolutely IS manifestly public. We can rest easy.

On Fri, Oct 1, 2021, 6:22 AM Thomas Streinz <tfs253 at nyu.edu<mailto:tfs253 at nyu.edu>> wrote:
Hi group,

I have been a lurker on this mailing list for quite a while and I'm glad that I may be able to provide some context on this issue that may be helpful. Let me also state at the outset that the following does *not* constitute legal advice and that I won't bill you 300 Euros for it either (indeed, I'm afraid, that number may be way too low to get actual legal advice that goes beyond reciting the relevant provisions of GDPR).

That said, I found this guidance from IAPP (the international Association of Privacy Professionals which has evolved into a a quite influential organization): https://iapp.org/news/a/publicly-available-data-under-gdpr-main-considerations/ Note how some of the guidance provided there is in tension with pervasive research practices, especially in data science fields ("when the data is part of official registers, such registers should be consulted on a need-to-know basis rather than copied in bulk just in case some data might be relevant").

My reading of this and the relevant provisions of GDPR suggests a ton of open questions, many of which indeed have not been resolved. For example, it's not clear whether (for purposes of escaping the additional requirements for sensitive data under Article 9) the data subjects in question made the personal data contained in their email "manifestly" public (that is: with the intention of further processing) - did the participants foresee the eventual creation of BigBang? It's also not clear to me how the requirements under Article 14 (need to inform data subjects) can be fulfilled in practice.

The scope of the research exception (Article 89) has been contested for a while and is a good example for the tensions in data protection law: researchers were worried that data protection law might make their work impossible; data protection activists were worried that a too broad exception would be exploited, including by commercial actors. The result is a terribly drafted provision. In my personal political opinion, I don't understand why Article 89 GDPR does not distinguish between public research in the public interest and private research in the private interest. I attach the leading commentary on Article 89, which unfortunately doesn't offer much useful guidance for our purposes. At least it references the relevant recitals at the beginning of GDPR which are part of the political compromise and can be helpful to understand better what the lawmakers had in mind (this is, for example, where the advice to use pseudonomization may be coming from, because that idea is mentioned in the relevant recitals; I'm not convinced this actually solves the problem because even pseudonomized data remains personal data and it will often be easy to re-identify the individuals if one wants to). I'm wondering, however, if it might be feasible to make the datasets only available for research purposes and only to other researchers to stay within the bounds of the research exception?

Like Niels, I have been worried for a while that data protection law might eventually throw a wrench into the important work that this group is doing. I haven't been privy to the whole conversation so far. I assume that the issue is whether or not the datasets you have assembled can or should be shared, and if so, under what conditions?

Note that the exceptions for "public" archives don't apply because those provisions only refer to archives that are required by law (which is not the case for IETF mailing lists). As Niels suggests, under a functional analysis, this research should be treated the same as research scrutinizing public communications of parliamentarians. Unfortunately, I doubt that a European Court would see it that way.

Maybe we can discuss this at one of the next BigBang meetings, in case helpful. One literature that I haven't consulted this morning concerns the interplay between "open data" and data protection law, which may offer some cues as to what's legally possible and what's clearly off limits (eg this paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2695005).

Sorry this got so long. All best to all of you on this list (whether actively participating or just lurking) -- Thomas

PS: For browsing GDPR, I recommend: https://gdpr-info.eu/ (which also lists the relevant recitals under each article)

On Fri, Oct 1, 2021 at 5:49 AM Niels ten Oever <mail at nielstenoever.net<mailto:mail at nielstenoever.net>> wrote:
Yeah, I was kinda of afraid for this. I would definitely support spending some money on the legal advice.

Weird thing is that data protection officers at university deal with this all very differently, I guess GDPR is also still a developing practice. So would be good to get a specialist to look at it.

One part of this that the person did not reply to, it that these mailinglists imho should be understood as public policy making. And policy makers have less expectations of privacy. I think that argument can also be made because the openness of the mailinglists is also explicitly used as legitimacy strategy for the standard-setting institutions.

Best,

Niels

On 9/30/21 11:01 PM, Christoph Becker wrote:
> Hi all,
> you might have noticed that here has been discussion on how we should share the datasets we have collected of public mailing archives. Our data format is quite different from how they are presented on GNU mailman or Listserv, which creates certain points of concern we should not neglect.
> I have been in contact with some people through the Prototype fund and have obtained the following advise:
>
> """
> Since you are dealing with "fully or partially automated processing of personal data" (Art. 2 Para. 1 GDPR), you fall under the provisions of the GDPR. Where you got the data from should be irrelevant for this point. Since you have collected the data without the consent of the persons, Art. 14 GDPR (information obligation if the personal data was not collected from the person concerned) could also be of interest. There are exceptions for scientific purposes (Art. 89 GDPR), but here too you have to pay close attention. Note that hashing mail addresses does not necessarily make the data "less dangerous". It would be better to pseudonymized the whole thing.
> My tip would be not to pass on any data, to refer to the scientific aspect of the processing and to spend € 200-300 on legal advice.
> """
>
> Through the Prototype fund we have the financial means to pay for legal advise.
> Please share your thoughts, comments, ideas.
>
> Best Wishes,
> Christoph
>
>
> --
> <><><><><><><><><><><><><><><><>
> //
> /Christoph Becker /(/he/him/his/)///
> PostDoc at the/
> /
> Institute for Biodiversity and Ecosystem Dynamics and
> Institute for Advanced Study
> University of Amsterdam
> P.O.Box 94248, NL - 1090 GE Amsterdam
> The Netherlands
> christovis.github.io/<http://christovis.github.io/> <https://urldefense.proofpoint.com/v2/url?u=https-3A__christovis.github.io_&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=6izWEoU5Au7hYN0VzT06cQ&m=kfk0LmXR_KN7j89dcn1Aq1wYb3ZohW8qxS8pxEbaDXs&s=ZVATk_IeiqyeMm2n5u8DDKmvxJUjANEua9ce_ETyYmY&e= >/
>
> _______________________________________________
> Bigbang-dev mailing list
> Bigbang-dev at data-activism.net<mailto:Bigbang-dev at data-activism.net>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ghserv.net_mailman_listinfo_bigbang-2Ddev&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=6izWEoU5Au7hYN0VzT06cQ&m=kfk0LmXR_KN7j89dcn1Aq1wYb3ZohW8qxS8pxEbaDXs&s=pgSXsvdUDcyIdwWzzuG2nEnGqcHzA0ZFQL7R7qQOW5w&e=
>

--
Niels ten Oever, PhD
Postdoctoral Researcher - Media Studies Department - University of Amsterdam
Affiliated Faculty - Digital Democracy Institute - Simon Fraser University
Research Fellow - Centre for Internet and Human Rights - European University Viadrina
Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio Vargas

W: https://urldefense.proofpoint.com/v2/url?u=https-3A__nielstenoever.net&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=6izWEoU5Au7hYN0VzT06cQ&m=kfk0LmXR_KN7j89dcn1Aq1wYb3ZohW8qxS8pxEbaDXs&s=nfRmXWnggXqHU8A2tmrYcBp45DZ5g0ASFe1T57NR4s4&e=
E: mail at nielstenoever.net<mailto:mail at nielstenoever.net>
T: @nielstenoever
P/S/WA: +31629051853
PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3

Read my latest article on Internet infrastructure governance in Globalizations here: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.tandfonline.com_doi_full_10.1080_14747731.2021.1953221&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=6izWEoU5Au7hYN0VzT06cQ&m=kfk0LmXR_KN7j89dcn1Aq1wYb3ZohW8qxS8pxEbaDXs&s=RZamNp83LA7uP9EJSscHVW-OXZ0zPM5VQ9p5jiK3smI&e=

_______________________________________________
Bigbang-dev mailing list
Bigbang-dev at data-activism.net<mailto:Bigbang-dev at data-activism.net>
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ghserv.net_mailman_listinfo_bigbang-2Ddev&d=DwIGaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=6izWEoU5Au7hYN0VzT06cQ&m=kfk0LmXR_KN7j89dcn1Aq1wYb3ZohW8qxS8pxEbaDXs&s=pgSXsvdUDcyIdwWzzuG2nEnGqcHzA0ZFQL7R7qQOW5w&e=
_______________________________________________
Bigbang-dev mailing list
Bigbang-dev at data-activism.net<mailto:Bigbang-dev at data-activism.net>
https://lists.ghserv.net/mailman/listinfo/bigbang-dev
_______________________________________________
Bigbang-dev mailing list
Bigbang-dev at data-activism.net<mailto:Bigbang-dev at data-activism.net>
https://lists.ghserv.net/mailman/listinfo/bigbang-dev
_______________________________________________
Bigbang-dev mailing list
Bigbang-dev at data-activism.net<mailto:Bigbang-dev at data-activism.net>
https://lists.ghserv.net/mailman/listinfo/bigbang-dev

--
Colin Perkins
https://csperkins.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20211005/5ae1750d/attachment-0001.htm>