[Bigbang-dev] on privacy impacts of email list archiving
Priyanka Sinha
priyanka.sinha.iitg at gmail.com
Tue Oct 25 14:52:48 CEST 2022
Thank you Nick for your detailed explanation.
It is indeed interesting to note that there is a x-no-archive flag in
emails. That certainly seems very relevant for this discussion.
My concern is if a mailing list, such as the IETF, has disregarded the
x-no-archive header, then why should we consider it a proxy for an
analytics privacy header? Maybe we should go back and see why the IETF
chose to ignore it in the first place? Maybe it is recorded in some
IETF meeting.
Let us say a mailing list has chosen to ignore the X-No-Archive header
and is providing those emails as part of their REST API for analytic
purposes, then why should we be stricter and remove them from
analytics? If an individual or organization is participating in a
mailing list, sending emails with x-no-archive header present, while
knowing fully well that that is not accepted by the mailing list and
is getting archived, then does that directly imply that the email
author(s) are expressing their explicit intent to their content being
excluded out of analytics?
For example, say participants from known risk minority groups or
geography putting forth unpopular opinions, have turned on their
X-No-Archive email header option. However, due to the mailing list
norms, the emails are archived anyways. Say those opinions were a
crucial part of the dialogue that happened and ended up being
actionable to the group. People manually going through the list can
still refer back to those emails. However, if we exclude them from
analytics, all downstream ML reasoning would be flawed and would
differ greatly from the collective consciousness of the group both
when it happened and also later on since the emails are available to
be read.
In case we feel that we still want to exclude emails from processing
which have the X-No-Archive header, maybe along with modifying BigBang
to skip these messages, we may consider to have an IETF draft to
propose an addendum to the X-No-Archive header RFC? that it explicitly
implies that the author wants it to be removed from analytics
irrespective of the mailing list's decision to ignore the header or
not.
Thanks and Regards,
-priyanka
---------- Forwarded message ---------
From: <bigbang-dev-request at data-activism.net>
Date: Tue, 25 Oct 2022 at 15:30
Subject: Bigbang-dev Digest, Vol 46, Issue 3
To: <bigbang-dev at data-activism.net>
Date: Mon, 24 Oct 2022 14:11:25 -0400
From: Nick Doty <ndoty at cdt.org>
To: bigbang-dev at data-activism.net
Subject: [Bigbang-dev] on privacy impacts of email list archiving
Message-ID:
<CA+tYtvHFLDU__O1FO9DSR3Bm8kxcAYH=3YOnvf94UY7W9aG2cg at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Two relevant pieces:
There is an existing, informal header to request that an email or
newsgroup message not be archived, X-No-Archive:
https://en.wikipedia.org/wiki/X-No-Archive
It sounds like default mailman installations will respect the header
and not archive messages with an XNA header. IETF specifically notes
that they ignore this header and publicly archive messages anyway. As
I believe headers are typically archived as well, it might be
interesting to measure how frequently this header is used, or what
kinds of messages are being sent with the no-archive header and are
being archived.
There has been some controversy stemming from someone sending a
message to an IETF mailing list without realizing it was a publicly
archived mailing list, and then being upset that there message and
email address were publicly posted. That included some argument that
GPDR gave a right to have that information deleted.
I think this partly stems from IETF not giving an email sender warning
ahead of time that a message will be publicly archived. W3C has an
"archive approval system" in which you receive an automated email the
first time you send to a list, and are informed and have to click
approval before your message will be distributed and archived:
https://www.w3.org/Mail/FAQ.html#aa
IETF doesn't have that, and I think maybe they should implement
something like that to decrease the likelihood of this situation.
Or if there were a commonly used technology to inform senders about
the practices of a mailing list and to confirm public archiving
preference, that could be usefully applied to IETF, W3C and many other
mailing lists which are likely to encounter the same problems.
We had briefly discussed on some previous BigBang calls whether there
should be a header regarding opting out of research on mailing lists,
which is a proposal that I apologize that I didn't pursue further. But
I think opting-out of archiving is likely the more relevant step -- it
would be good if it were more effective and more understood by email
senders whether a message is going to be archived or whether or not
the sender wants it to be archived. And we could consider modifying
bigbang to skip messages or warn about messages that have an
x-no-archive header.
?Nick
More information about the Bigbang-dev
mailing list