[liberationtech] Introduction --- Haystack

Sun Sep 12 22:04:21 PDT 2010

On Sat, Sep 11, 2010 at 06:42:56PM -0700, Daniel Colascione wrote:
> I'm Daniel Colascione. I wrote every line of Haystack and I am the
> author of the FAQ on the Haystack website. I'm glad to have been
> invited to this list.

Hi Dan, all,

I also just appeared on the list yesterday. Glad these discussions are
finally started. Here are a few more thoughts. (For context, I wrote
Tor originally and still help to write / manage / etc a lot of it.)

> - Our testers are aware that the test version is not the finished
>   product, and they are aware that it does not provide the security
>   properties we guarantee for the finished product.

As a security person, I find that the phrase 'security properties' or
'security goals' works a lot better than 'security guarantees'. I know
it sounds more technical and/or less confident, and that your users will
be less certain they know what you mean by it, but that's the point.

(Or to say it differently, show me a security guarantee and I'll show you
a case where in retrospect you'll wish you hadn't called it a guarantee.)

> - Our public statements about Haystack's capabilities apply to our
>   design for the finished product, not to the test version.
> 
> - The Haystack client will be released as an open-core system under
>   the GPLv3 or later.

You're free to choose your own license, but here's a side anecdote:
when I first started Tor, I put it out under the GPL. Then I started
interacting with developers of other projects, and wanted to say "Tor
solves that problem you're working on; use our code", but they couldn't.

My experience with Tor has led to this conclusion: BSD licenses are
well-suited for situations where the world doesn't have any good answers
to a problem, and you want to maximize the chances that the problem gets
solved at all. Copyleft licenses are well-suited to situations where the
problem is well-solved but only by proprietary solutions, and it's time
to have an actually free solution.

> - We will not publish our server code.

Is this out of concern that it would make your design weaker, or out of
concern that it would make it easier for somebody to set up a competing
service? Because if you're trying to be a transparent non-profit,
"it won't help your security much" starts to sound like a shaky answer
(even apart from the debates about whether it's accurate).

> - ensure that external connection endpoints (reached after passing
>   through our system) do not IP address of the connection originator
>   except via unavoidable application-layer leakage.

This one is probably tougher than you think. See
https://www.torproject.org/torbutton/design/
for some of the issues you should be considering.

In particular, because you're using a proxy design where users have
to configure their applications to use the proxy, there are a _lot_
of ways that a web site can instruct the browser to reveal information
about the user.

And if we're talking about disasters like Internet Explorer, it's
basically unsolvable. The Tor Project started out saying "that's outside
our threat model", but once we got actual users, it became clear that
we had to do something more than wishing them luck at securing their
applications.

> Haystack is a centralized censorship-circumvention system. We
> fundamentally disagree with the notion (spelled out by Dingledine, et.
> al. [1]) that only a decentralized system can protect users

For those following along at home, a more condensed version of this
argument can be found at #5 at
https://svn.torproject.org/svn/projects/articles/circumvention-features.html

>  The purpose of traffic obfuscation is
> *NOT* to make Haystack absolutely undetectable: that is impossible. We
> assume that a human looking at network traces could eventually learn
> to differentiate the traffic.
> 
> The goal is to raise the false positive rate of any automated blocking
> attempt to a level that is unpalatable for firewall administrators.
> Similarly, we aim to make automated differentiation of Haystack and
> non-Haystack users infeasible. For our purposes, a moderate degree of
> obfuscation will suffice.

As Bram has been hinting, this is probably a harder problem than you
think. I won't get into all the technical issues here, but I think it's
fair to say that this is probably the hardest open research question
that your marketing people have claimed you've solved.

I can't imagine this is an arms race that will be easy to play as it
scales. You need to not only come up with cool new high-bandwidth pieces
of http that you can stick stuff in, but you need to keep your attacker
from realizing those are the ones you're using.

Your best bet is to try to guide the situation into one where your
adversary doesn't care to start the arms race; but that strategy would
seem to conflict with your goal of scaling, probably conflicts with
your choice of adversary, and quite clearly conflicts with your (past)
marketing strategy.

> Exit nodes are similar to entry nodes in that they merely forward
> traffic. Their primary purpose is to obscure the location of our
> processing nodes. We have considered using Tor as the exit node
> network, but in addition to this being quite impolite, it would
> constrain scalability. We would appreciate community input in this
> area.

Yes, please don't do this. :)

 From a technical perspective, it composes the tools in a way that
doesn't actually retain the properties of each. If you want to make use of
the security properties that Tor aims to provide (I'm thinking mainly of
unlinkability), you'll want to put the Tor client on your user's machine,
and use your magic unblockable proxy as transport to the first hop. Then
the user is the one who chooses her path through the Tor network (not
you), and the user gets encryption such that the proxy can't learn the
destination or read the traffic.

Tor can handle using an https proxy, so maybe we're pretty much there
already. Two more things to point out though:

a) Using Tor over a centralized proxy like Haystack would weaken the
anonymity that Tor can provide. Tor's anonymity comes from the diversity
of possible points exiting the Tor network, and the diversity of possible
points entering the Tor network. If these users are all known to enter
the Tor network through Haystack's computer in California, that becomes
a straightforward place to want to tap. So all else being equal (which
it usually isn't), Tor-over-Haystack would provide weaker anonymity than
just Tor.

b) Let's explore this Tor-over-Haystack notion further: there are
actually two properties that we inherit at once, and we're mushing them
together. First, there's the magic voodoo unblockable transport. Second,
there's the centralized bottleneck for the traffic flows. Wouldn't it
be great if there were a way to separate these features? It's kind of
moot while the magic voodoo isn't around yet, but if does come to exist,
designing things in a modular way could provide building blocks that
can make other tools better.

> To reduce bandwidth costs, the processing and feeder nodes will run
> locally-caching HTTP proxies, though with a short maximum expiration
> time to minimize the impact of any compromise. Locally-caching proxies
> would be worse than useless on feeder nodes.

I'm assuming you mean that the *exit* nodes will run caching http proxies,
not the feeder nodes.

> I can provide a more formal description of the protocols at another
> time. The goal is to ensure that Haystack is _AT_WORST_, no worse than
> any other encrypted proxy in terms of formal security guarantees. It
> is this property that permits us to tolerate a best-effort approach to
> our obfuscation engine. Privacy is the single most important concern.

I recommend the term 'confidentiality' here rather than
'privacy'. Otherwise there will be people who think they know what you
mean and they'll be wrong.

Heck, going a step further, it would be even clearer to say "Encryption
between the user and our service".

> Some have expressed that the effective consequences of detection might
> be greater for Haystack than for other systems because Haystack has
> been "marketed toward dissidents". First of all, this is not a
> technical objection and should be rejected out of hand. Second, I am
> incredulous: Haystack is designed to bypass Internet censorship, and
> that is its stated purpose. That capability is useful to dissidents,
> but to many others besides. Haystack's userbase will include the same
> kind of people who use Tor, Freegate, and Ultrasurf. One could argue
> that many users of these programs could be targets for authorities.
> Haystack is not special in this respect, and this entire line of
> argumentation is unfounded.

Hoo boy. This is a tricky one, but a very important issue.

If your argument is "the technical design I've described here, considered
in a vacuum, provides its security properties no matter whether you name
it X or Y", then I agree with you.

But you can't say that deploying this design in Iran under the name
Haystack will yield the same results -- including the levels of security
that users get, for a wide variety of types of security -- as deploying
it under some other name.

You have to consider the economics of deployment, user base composition,
publicity and perception, etc as _security parameters_ for your system.

See also items #1 and #10 in
https://svn.torproject.org/svn/projects/articles/circumvention-features.html

For a similar discussion in the context of anonymity systems, see
http://freehaven.net/anonbib/#usability:weis2006

> - If our obfuscation methods were defeated to such an extent that
>   automatic traffic classification became possible and
>   non-resource-prohibitive, authorities could detect and block
>   Haystack. We expect that this will happen, and plan to change and
>   improve obfuscation techniques often enough to defeat this attack
>   overall.

Sounds like your marketing materials need to make it clearer that
"Periodically Haystack will stop working, and at that point it will
also be easy for the bad guys to notice you're a Haystack user, and
this is part of our plan. Oh, and there may also be times where a) it
becomes easy for the bad guys to notice you're a Haystack user, yet b)
we aren't able to recognize that it has."

Hope that helps,
--Roger