[liberationtech] Censorship resistance attacks and counterattacks

Sat Sep 11 17:17:40 PDT 2010

For the sake of clarity of discussion, here are my general thoughts on 
censorship resistance attacks and counterattacks. My apologies if this writeup 
is a little rough, I just threw it together.

For the purposes of these notes, by 'censorship resistance tools', I'll be 
referring to ones for browsing the web from inside of countrywide firewalls 
which are trying to limit access, such as Freegate, Ultrasurf, the like. 
Obviously there are other forms of censorship and resistance to it, but that's 
what's being discussed for now.

The usage pattern for censorship resistance tools goes something like this:

1) system sends information about proxies to users

2) users use proxies to browse the web freely

3) firewall operator finds out IPs of proxies and blocks them by IP

4) go back to step 1

It's an ongoing cat and mouse game involving cycling through a lot of IPs and a 
lot of careful secrecy.

An attacker might also, instead of outright blocking an IP, artificially create 
a very high packet loss rate going to it, which might make users conclude that 
the anti-censorship system doesn't work very well and give up on it. That could 
be countered by trying to guess when there's an artificially high packet loss 
rate, but that's potentially an insidious game - the attacker might, for 
example, determine where the machines developers use for testing are, and not 
artificially drop packets to those.

There's considerable concern about the threat model of the censor finding out 
which users are using the proxies and doing bad things to them. I'll just cut to 
the chase on that issue - the resistance to attacks of that form is inherently 
weak. The censor can simply record the destinations of all outgoing connections, 
and retroactively correlate them to discovered proxies, unveiling the IP of a 
user. This is a vicious attack which can't be completely eliminated. Posession 
of the tool might also be incriminating.

High level methods of avoiding detection include:

* Have lots of cover traffic - that is, lots of users, so attacking them all is 
impractical. This is probably the ultimate solution, because a tool which 
doesn't have enough users to provide cover traffic isn't successful, and a 
successful tool implicitly provides lots of cover traffic.

* Have user use shared/ephemeral IPs. This is a low tech approach having little 
to do with the protocol.

* Use no software, that is, http/https proxies. This makes the user have no 
recurring evidence, but can expose what the user is doing to snooping.

* Use ephemeral or easy to dispose of software. This is a good idea, but the 
techiques for doing it are tricky or rely on physical security.

* Run proxies on web sites running other services which are also used by users 
within the target area. This is a great approach, but requires cooperation of a 
web site which has the willingness to be (or confidence it won't be) blocked.

* Use actual skype connections. This is an interesting approach which has the 
benefit of lots of cover traffic, but suffers from limitations on the bandwidth 
skype intermediaries will provide, and could be attacked by an attacker running 
lots of high quality skype nodes and noticing the very suspicious traffic.

* Dial down the level of paranoia. In the end a certain amount of this may be 
necessary.

Censors have multiple ways of finding IP addresses which are used by the 
anti-censorship system:

* Use the same methods as the software. This is a very insidious approach, 
putting the anti-censorship system in a position of trying to simultaneously 
publish new IPs and keep their distribution limited.

* Correlation attacks on existing known IPs. This is also a very insidious 
attack - the attacker simply takes IPs which are known to use the 
anti-censorship tool, and looks for otherwise unpopular IPs which a lot of those 
are connecting to. 

* Probing - an attacker can connect to suspected proxies and try to get them to 
give themselves away by doing a handshake. Depending on the type of proxy 
connection used, this can be very effective, sometimes in combination with 
reverse DNS.

* Trick proxy users into hitting a web site and observe what IPs the connections 
come from, observing the IPs of the proxies directly.

* Deep packet inspection and traffic pattern analysis, including packet sizes, 
connection number and duration, etc. These can be extremely effective, but can 
be extremely expensive for an anti-anti-censor to set up. Connection number and 
duration are probably the most telling pieces of information, and the cheapest 
to implement, as well as the easiest for the anti-censor to manipulate.

There are several ways for an anti-censor to make it hard to find their IPs:

* Use lots of IPs. If each user can be given their own dedicated IP then the 
system is extremely hard to attack. Problem is, this approach requires procument 
of lots of IPs, which isn't easy.

* Limit how many users info is given to. This is a good idea, but difficult to 
do.

* Encrypt info with not widely circulated keys. This moves the problem to key 
distribution and management, which is a good idea.

* Distribute fake IPs including stuff the censor would regret blocking. I think 
this is kind of fun.

* Have clients only connect to one IP. This is a very good idea! Should be 
followed as closely as possible.

* Make traffic go through more than one hop, masking the IPs of proxies to 
connections on the outgoing side. While clearly a good idea, this doubles the 
bandwidth used, which kind of sucks.

* Rely on deep packet inspection being hard. Less unreasonable than you might 
imagine - deep packet inspection systems are very expensive and take a while to 
upgrade, and intelligence on what the deep packet inspection can do is sometimes 
available.

* Steganographically encode connections to proxies - this obviously must be 
done, although it isn't obvious what the best approach is.

There are several things proxy connections could be made to look like -

* HTTP - while there's plenty of cover traffic for HTTP, deep packet inspection 
and probing can probably be very effective in recognizing patterns in it, making 
it not very appealing for stego connections

* SSL/TLS - there's a decent amount of cover traffic for TLS connections in the 
form of HTTPS, and using the HTTPS port is probably a good approach, especially 
since the traffic patterns are going to match http anyway, since that's what it 
is. There's some concern that man in the middle attacks might be launched, 
although those are difficult, and an attacker might get suspicious if reverse 
DNS doesn't return believable information. Still, this may be the best option, 
and is certainly the simplest to implement.

* BitTorrent - BitTorrent has lots of cover traffic, and the obfuscated version 
of the protocol looks fairly generic, although its traffic patterns are very 
distinctive and wouldn't be closely matched by anti-censorship web browsing.

* utp - utp is a udp-based TCP-alike originally designed for BitTorrent. It has 
the advantage that some deep packet inspection systems just plain don't support 
UDP, and it's easy to use as a swap-in replacement for TCP. It has some of the 
same cover traffic problems as regular BitTorrent.

* SSH - while tunneling over SSH is not uncommon, making using SSH connections 
no more suspicious than having long-lived high-throughput SSH connections is to 
begin with, that's already a high level of suspiciousness, so this probably 
isn't a great approach.

* skype - skype traffic has good cover traffic, but is a very poor match in 
terms of usage patterns.

* noise - a TCP connection which has just plain garbage going over it is a 
surprisingly reasonable approach. Lots of weird miscellaneous things on the 
internet are hard to classify, and obfuscated BitTorrent provides a decent 
amount of cover.

There are several methods a censorship resistance system can use to get IP 
addresses out -

* offline - this is the most secure way, but it's very slow and expensive

* spam cannon - a spam blast can be sent out containing addresses of proxies. 
This works but is moderately slow and moderately expensive. It's also 
potentially very easy to intercept.

* to existing users - client software can be sent IPs of failback proxies when 
it makes a proxy connection. This works and is fast, but has the problem that an 
attacker can run client software and use it to find proxies as well.

* via web stego - this technique hasn't been used yet, but IPs could be encoded 
steganographically in real web traffic. Given the tremendous popularity of 
censorship resistance tools in the west, it might be possible to enlist the help 
of lots of web sites, and make it essentially impossible to filter them all out. 
I'm working on technology for this.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20100911/d963c41f/attachment.html>