[liberationtech] Concept for takedown-resistant publishing

Daniel Margo dmargo at eecs.harvard.edu
Fri Feb 3 11:46:11 PST 2012


On 02/03/2012 10:26 AM, Arturo Filastò wrote:
> On Feb 3, 2012, at 2:02 AM, Daniel Margo wrote:
>> Torrent is an open protocol. There are numerous open source torrent libraries. Opera can download torrents natively, for Firefox there are extensions to do this, and so far as I know every browser has some kind of plug-able framework for MIME type (Internet media type) handling. To have a browser download and then open a torrent of a page is totally realistic, and probably not that much code.
>>
>> The problem is that this is not the actual technical challenge. The actual challenges are these:
>> 1. Name resolution: How do I find the torrent file, and then the P2P cloud itself?
>> 2. Updates: How do I share a Web site as a living, updating document?
>> 3. Server backend: Is it realistic to run a modern Web site without a client-server relationship?
>>
>> 1. Name resolution comes in two parts: finding the torrent file, and then finding the actual P2P cloud. Web sites like The Pirate Bay function as name resolution services for finding torrent files, but obviously any single such server can be blocked. What I presume we actually desire is a name resolution service just like DNS for Web sites: you type in a URL, it gets resolved to a torrent file.
>>
>> URLs have a hostname and a path, e.g. "www.hostname.com/path/to/torrent.file". "www.hostname.com" is resolved by DNS to a server, and then "/path/to/torrent.file" is resolved by that server. So unfortunately, you don't gain any takedown resistance by hosting your torrent file at "www.myserver.com/my.torrent", because if "www.myserver.com" is taken down, then "/my.torrent" can't be resolved. So what you would actually need is for DNS to provide hostname resolution directly into torrents. This is a sweet idea, but changing DNS is hard.
>>
>> Even if you could do that, well, what's in a "torrent file"? A torrent file contains information to get you joined into the P2P cloud; specifically, it contains the address of a tracker, a central server that gates entry into the cloud. If the tracker is down or blocked by some technology, you can't enter the cloud. This is primarily a weakness of the BitTorrent protocol, and not the idea itself; there are extensions to BitTorrent and other P2P protocols that are resilient to this weakness. But fundamentally, finding your way into the P2P cloud is an act of name resolution to find other peers in the cloud, which is most easily done by a name resolution server, in this case the tracker. If we're going about making changes to DNS, probably the most technically sane (but politically unrealistic) solution would be for DNS itself to provide tracking capability for the cloud. Again, that is a sweet idea, but changing DNS is hard.
>>
>> I don't mean to suggest these issues are insurmountable, merely that this is the actual Hard Part.
>
> This is a quite accurate analogy. Though you are only thinking of one site per torrent. Something that would
> make this scheme much more powerful (as it would have more seeders) is one torrent for multiple sites.
>
> You could then use a scheme like the magnetic URI scheme to reference a particular site (file(s) of the site). You
> still have the problem of takedown of a particular tracker, but at least the serving of the link
> (the .torrent that is replaced by the Magnetic URI) can be done over any other channel.
>
> You are not necessarily worried of DNS censorship as you could provide the tracker address as an IP address.
> You are still worried of censorship through other means (IP blocking), but this can be overcome by having multiple
> trackers running.
What you are bordering on suggesting here is setting up a parallel DNS 
in the form of multi-trackers for "torrent sites" -- which I actually 
think is a really good idea. I picture something in my head like the 
browser/extension/plugin resolves URIs of the form 
"tor://www.mysite.com" by reference into the distributed tracker. The 
main issue would be distributing the addresses of the trackers and 
keeping them available -- but that seems to me like a surmountable problem.

>> 2.  In order to support dynamic content there has to be a cryptographic distinction between updates coming from the legitimate publisher of the site (Alice), and subterfuge coming from Evil Eve. Cryptographic authentication might be authenticated by the data being shared (e.g. this data is signed by Alice, and everybody knows her signature) or by the P2P network itself (e.g. in addition to sharing the torrent, the servers also provide a distributed authentication service that will only accept updates signed by Alice). I am by no means an expert on this subject, so I will refrain from talking about it extensively, but I bring it up merely because cryptography is non-optional for any dynamic scheme, and I'm not aware of any update-able, cryptographically-secured P2P torrents. It sounds like maybe they should exist? It also sounds Hard, and Google isn't turning anything up.
>>
>> Again, I'm not suggesting these issues are insurmountable; in some sense Google Docs does all this. But they do it with a pretty sophisticated backend that glues many technologies together (I guarantee Google has a killer internal name resolution and authentication service), and I have no idea how Hard it would be to make those parts takedown-resistant (in the sense that there are no central servers. There are unquestionably central servers at Google).
>>
> This is a very tough problem and such a feature would be a core one of "multiple sites per torrent" scheme.
> Though I disagree that there should be an update feature. The best option is probably to have an append
> only mechanism so that you are not worried of Evil Eve changing content that is already there.
>
> This would also have a benefit for the resilience of the network since the protocol does not understand the concept
> of "modification" (and therefore deletion) of content. Nobody can force you to remove your content because it is not
> technically possible.
 From a distributed systems technology standpoint, append-only is an 
extremely simplifying assumption -- which is a good thing! It makes this 
technology more realistic to build. In particular, it moves us from the 
realm of P2P file sharing and into P2P publisher-subscribe networks, 
which I believe are more evolved with respect to identifying trustworthy 
updates (no promises, I haven't worked on this in a while.)

>
>> 3. The real elephant in the room is that modern Web sites are best thought of as programs, not files, and program distribution is infinitely harder than file distribution. When you visit a Wordpress blog, what you appear to receive is an HTML file: but in actuality that HTML was streamed by a PHP script running on the server talking to a MySQL database. The number of layers in this onion is arbitrary; it's anything you could run on a computer. Arbitrary code, or I could hook MSPaint up to the thing if I wanted it enough. Distributing *and executing* arbitrary code like this is Quite Possibly Impossible, and if it is possible it is Very Far Away. At any rate, BitTorrent can't do this.
>>
>> The word "arbitrary" is important. In specific cases, you can certainly find a case-specific resolution. Diaspora is building something like a distributed social network "program", and I do honestly believe that with a lot of code and hard thinking, you could distribute a Wordpress blog's backend on P2P. It would probably require a total rewrite of Wordpress such that it wouldn't even be the same piece of software, and you would have to solve the other Hard Problems above, but I think it is technically possible at this time. But in the general case, you might have more luck working towards a Singularity and then asking the Machine-Gods for an answer.
>>
> Projects such as unhosted (http://unhosted.org/) lead me to believe that the future of web technologies is in moving
> more and more logic away from the server and into the hand of the client.
> The diffusion of fast computers and the speed of current javascript interpreting engines have made it possible (and
> in most cases even desirable) to make clients do what once was only done by servers.
>
> While certain web sites will probably never be moved totally into the clients hands (I am thinking of google for
> example), a great part of the web we use today could very easily be moved into pure client side logic and the
> server only used for data storage. A blog for example is just a set of posts (content) that is updated every
> X time. All the processing of that data (pagination, styling) can easily be moved into the clients hands.
>
> The only limit that I see is that the data would be able to flow in one direction from the site to the client, though
> I don't see this as a critical factor to the success of such a scheme.
The project you linked is interesting and worthy, but it works precisely 
by virtue of not addressing a general case. What they offer is a 
specific programming model that from a takedown perspective, is more 
robust, but from a site design perspective is more restrictive.

By way of example, let me push back on the idea that a Wordpress blog is 
just post content and render logic. A Wordpress blog is also a source of 
authentication. This is most obvious when you log in to write posts, 
install addons, whatever. There are different administrative levels, and 
in fact there are something like 4-7 crypto keys being juggled in the 
background.

More subtly (but perhaps more seriously), authentication takes the form 
of security in the canonicality of server-side code. When I comment on a 
Wordpress blog, my comment is passed to the server in the form of a text 
string which goes through considerable sanitization before it shows up 
as a comment. It's parsed for dangerous SQL strings, usually passed 
through a swear-checking service, and most importantly it ends up 
correctly stored in the right way in the right place. I can be confident 
this is happening correctly because they script that is doing so is 
running on my server *and has restricted write and execute permissions*. 
If that script is running on a client and talking to the data store via 
some protocol, then I have to design that protocol in such a way that 
the client can't cheat, without necessarily having a central server that 
at the end of the day will say "yup, no dangerous SQL."

Getting philosophical, the kinds of permissions a remove data store 
model permits are read and write. You can do a lot with this; you can 
make sure unauthenticated users can only write comments, for example. 
But POSIX permissions are a good bit more sophisicated, and in 
particular you can have readable, un-writeable, executable code, which 
means you know exactly what code is running and what it should do. This 
is an advantage a canonical server agent provides. In the case of 
distributed applications, individual peers can easily be byzantine, 
which is a badass way of saying they are jerks and they can execute 
whatever plan they want. You design around this by by designing your 
distributed P2P protocol very well -- usually on a case by case basis, 
and more often than not you will eventually come to a point where you 
say forget it, this is too complicated, let's just have a central server 
for this one hard part we can't solve and shadow it 3 times. That's 
robust against random failure, but not intentioned takedown.

The impression I get from Unhosted is that they want to change the 
person (singular)-to-service relationship such that more power is in the 
hand of the person to manage their (singular) data. I think this becomes 
much, much harder when you have people, that is, peers, laboring in the 
production of a collective illusion, such as a blog *with comments* or 
any other kind of community participation. It's by no means impossible, 
it gets done all the time, but for now it gets done on a case-by-case 
basis. So maybe for now what you can build is an append-only blog 
without those features, which is essentially just a pub-sub news 
service, which I think is totally realistic in the now and a good idea.
- Daniel Margo

>> What seems more realistic to me is taking snapshots of a Web site and distributing those as files instead, rather than trying to distribute the actual Web site program. That's why we designed Mirror As You Link to work that way.
>>
>> These are all the hard issues I can think of, as a technical person with some distributed systems background. There may be others, since in some cases we're really exploring uncharted waters here.
>> - Daniel Margo
> Thanks for your good rationalization of the the issue.
>
> - Art.
>
>




More information about the liberationtech mailing list