[liberationtech] [sunlightlabs] need advice on using hashes for preserving PII's utility for disambiguation while protecting sensitive info

Thu Mar 20 14:46:30 PDT 2014

Arggh. Wrong link. Apologies to all and thanks to James McKinney. That's
what I get for having that many tabs open.

https://sunlightfoundation.com/blog/2014/03/20/a-little-math-could-make-identifiers-a-whole-lot-better/

On Thu, Mar 20, 2014 at 5:44 PM, James McKinney <james at opennorth.ca> wrote:

> Do you mean this post?
>
>
> https://sunlightfoundation.com/blog/2014/03/20/a-little-math-could-make-identifiers-a-whole-lot-better/
>
>
> On Mar 20, 2014, at 3:44 PM, Tom Lee <tlee at sunlightfoundation.com> wrote:
>
> Thanks again to everyone who helped me think through how government's
> approach to disclosing identifiers could be improved through checksums,
> tokenization and related techniques -- it was extremely helpful.  The
> resulting post is here:
>
>
> https://sunlightfoundation.com/blog/2013/07/25/the-sunlight-foundations-comments-on-the-faas-proposed-open-data-policy/
>
> I'd be grateful for any feedback -- or, especially, corrections -- that
> might occur to you.
>
>
> On Thu, Feb 6, 2014 at 3:49 PM, Tom Lee <tlee at sunlightfoundation.com>wrote:
>
>> We've been kicking around an idea at Sunlight that aims to use
>> cryptographic ideas to resolve some of the concerns around the publication
>> of publicly identifiable information in government disclosures. I could use
>> some smart people to tell me what's dumb about it.
>>
>> We often face challenges related to disambiguating entities: is the John
>> Smith who gave political donation A the same John Smith that gave political
>> donation B? One obvious solution to this problem is to push to expand the
>> information that's collected and disclosed -- if we had John's driver's
>> license number (DLN), for instance, it'd be easy to disambiguate these
>> records. But that could introduce privacy concerns for John. One approach
>> to this problem (which I don't think government has tried) is employing a
>> one-way hash.
>>
>> Obviously the input key space for DLNs and most other personal ID numbers
>> is so small that reversing this with a dictionary attack would be trivial.
>> You can add a salt, but only on a per-entity basis (not a per-record basis)
>> if you want to preserve the capacity to disambiguate. That in turns calls
>> for a lookup table in which the input keys are stored, which kind of
>> defeats the point of using a hash (you might as well just assign random
>> output IDs for each input ID). I would worry about government's ability to
>> keep this lookup table secure, and I worry about the brittleness of such a
>> system.
>>
>> Alternately, you can use a single system-wide secret (or set of secrets)
>> to transform inputs into reliable outputs. I think this is less brittle and
>> maybe easier to preserve as a secret, but this system might be too easily
>> reversible given the ability to observe its outputs and know the universe
>> of possible inputs. I'm unsure of the cryptographic options that might be
>> appropriate here.
>>
>> For all I know, the lack of implementations using this kind of one-way
>> transformation isn't about government sluggishness but rather about its
>> feasibility. I'd be very curious to hear folks ideas on this score, though.
>>  My general hunch is that something must be possible -- even a few bits'
>> worth of disambiguating information would be hugely useful to us, and
>> presumably you're not leaking important amounts of information by, say,
>> sharing the last digit of a DLN. So there must be a spectrum of options.
>> But as is probably apparent, I don't think I've got a handle on how to
>> think about this problem rigorously.
>>
>> Tom
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sunlightlabs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sunlightlabs+unsubscribe at googlegroups.com.
> To post to this group, send email to sunlightlabs at googlegroups.com.
> Visit this group at http://groups.google.com/group/sunlightlabs.
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "sunlightlabs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sunlightlabs+unsubscribe at googlegroups.com.
> To post to this group, send email to sunlightlabs at googlegroups.com.
> Visit this group at http://groups.google.com/group/sunlightlabs.
> For more options, visit https://groups.google.com/d/optout.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20140320/0f5361c7/attachment.html>