[liberationtech] Introduction --- Haystack

Sat Sep 11 21:09:54 PDT 2010

Hi Bram. Thank you for the feedback.

On 9/11/10 8:48 PM, Bram Cohen wrote:
> I strongly urge you to use a BSD-style license rather than GPLv3. Using GPLv3 
> would ensure that nothing you ever released could be useful to other censorship 
> resistance or anonymity tools, which I don't think is your intent.

The GPLv3 prevents other organizations from appropriating the code
without contributing back. Other tools could just as easily release
themselves under the GPL. People before you and I have argued this issue
since time immemorial, and we're not going to resolve it on-list. I
think we'll have to agree to disagree on this one.

> How efficient is your http-based obfuscation code? That is, how much larger is 
> the obfuscated traffic than the non-obfuscated traffic?

Expansion is horrible. In some early code, it's a factor of 5 or so.
That's one reason we want to keep performance very good in other areas
and another reason we're looking at HTTP optimization research. (I keep
wanting to say "ciphertext expansion", but this isn't ciphertext per se.
Can you think of a better term?).

> How do you plan to block a malicious web site from grabbing info about the 
> client's IP address? For example, a flash applet can look up the local machine 
> IP and communicate that back.

This falls under "user education and software mitigation". All software
of this type has that problem. There's really very little we can do.
Something we've considered is distributing bootable media containing a
pre-secured web browsing environment that would solve this problem and
some other ones besides.

> Why do you have client authentication? 

It seemed like a good idea to verify that the client has a valid
signature created by us. I couldn't think of any downsides.

> Your plans for that sound rather DRMish, 
> which is both unlikely to work and sounds counter to the goal of having lots of 
> users. 

True. We made a decision long ago that it's better to limit the
information an attacker could learn about the network than to make the
software more convenient to use in some sense. We're certainly willing
to revise the trade-off given the right argument.

> If your goal is to limit information about the network which an attacker 
> can get from compromising one client, that should be done by having secrecy be 
> in the form of secret keys rather than secret code.

Yes, you're invoking Kerckhoff's famous principle. But correct me if I'm
wrong, I don't think there's a good way to parametrize an obfuscation
*system* using a replaceable key while leaving all the other parameters
intact. You can arrange things such that you can't extract a meaningful
message without some kind of key (as in chaffing and winnowing) but
hiding the existence of the message, especially in a stream that's not
supposed to be statistically random to start with, is a lot harder.

Considering that we're in the "DPI is hard" genus of your
anti-censorship taxonomy, and that we have other, much better mechanisms
to guarantee security, I think using a modular, swappable, and secret
obfuscation approach is reasonable here.

> What is the reason for having separate exit nodes? They require extra bandwidth, 
> and don't block any obvious threats.

Mainly it's to prevent retaliation against our difficult-to-move
rack-mounted processing nodes, and to prevent destination website
operators being able to know a given using is using Haystack. When in
doubt, it's better to not leak information.

> To be politically correct you should really use sha-256 rather than sha-1

I did considered that, but SHA-1 is holding up very well considering
what it's been subjected to, and I don't feel like the extra
computational cost is warranted. We'll be computing a *lot* of hashes.
Show me that SHA-1 is on its way to being as badly broken as MD5,
however, and I'll change my mind in a hurry.

> You should get your random data from /dev/urandom instead of using mersenne 
> twister. /dev/urandom doesn't have the ridiculous blocking properties that 
> /dev/random does, but is still cryptographically strong. There's an equivalent 
> service under Windows - I forget the name, but it's the thing which Python uses 
> for os.urandom

The Windows entropy source is CryptGenRandom, yes. I recall a paper I
read a few months ago exploring its implementation and weaknesses.

Regardless, seeding with /dev/random at least gives you actual, real,
and true randomness as far as the system can muster. A CSPRNG with a
large period seeded with true randomness might as well be truly random.
/dev/urandom just returns a best guess when it runs out of entropy, and
we don't control that guessing.

But like I said, I'm not particularly tied to a particular CSPRNG
implementation. I'm just happy it's not rand(). :)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20100911/caf20c60/attachment.asc>