[liberationtech] Concept for takedown-resistant publishing

Daniel Margo dmargo at eecs.harvard.edu
Fri Feb 3 12:35:30 PST 2012


After consulting a shower, I realized the way you would do comments is 
by storing them in a format that required no pre-processing (e.g. SQL 
sanitization) and then doing all post-processing (HTML sanitization, 
BBcodes, swear removal, w/e) at the client, where if they're byzantine 
all they affect is their self. This is probably extensible to any data 
storage-and-retrieval feature where there is A. no pre-processing, and 
B. no more post-processing than can be realistically done at render 
time. That still expands the universe of possible features a good bit.
- Daniel Margo

On 02/03/2012 02:46 PM, Daniel Margo wrote:
>
> On 02/03/2012 10:26 AM, Arturo Filastò wrote:
>> On Feb 3, 2012, at 2:02 AM, Daniel Margo wrote:
>>> Torrent is an open protocol. There are numerous open source torrent 
>>> libraries. Opera can download torrents natively, for Firefox there 
>>> are extensions to do this, and so far as I know every browser has 
>>> some kind of plug-able framework for MIME type (Internet media type) 
>>> handling. To have a browser download and then open a torrent of a 
>>> page is totally realistic, and probably not that much code.
>>>
>>> The problem is that this is not the actual technical challenge. The 
>>> actual challenges are these:
>>> 1. Name resolution: How do I find the torrent file, and then the P2P 
>>> cloud itself?
>>> 2. Updates: How do I share a Web site as a living, updating document?
>>> 3. Server backend: Is it realistic to run a modern Web site without 
>>> a client-server relationship?
>>>
>>> 1. Name resolution comes in two parts: finding the torrent file, and 
>>> then finding the actual P2P cloud. Web sites like The Pirate Bay 
>>> function as name resolution services for finding torrent files, but 
>>> obviously any single such server can be blocked. What I presume we 
>>> actually desire is a name resolution service just like DNS for Web 
>>> sites: you type in a URL, it gets resolved to a torrent file.
>>>
>>> URLs have a hostname and a path, e.g. 
>>> "www.hostname.com/path/to/torrent.file". "www.hostname.com" is 
>>> resolved by DNS to a server, and then "/path/to/torrent.file" is 
>>> resolved by that server. So unfortunately, you don't gain any 
>>> takedown resistance by hosting your torrent file at 
>>> "www.myserver.com/my.torrent", because if "www.myserver.com" is 
>>> taken down, then "/my.torrent" can't be resolved. So what you would 
>>> actually need is for DNS to provide hostname resolution directly 
>>> into torrents. This is a sweet idea, but changing DNS is hard.
>>>
>>> Even if you could do that, well, what's in a "torrent file"? A 
>>> torrent file contains information to get you joined into the P2P 
>>> cloud; specifically, it contains the address of a tracker, a central 
>>> server that gates entry into the cloud. If the tracker is down or 
>>> blocked by some technology, you can't enter the cloud. This is 
>>> primarily a weakness of the BitTorrent protocol, and not the idea 
>>> itself; there are extensions to BitTorrent and other P2P protocols 
>>> that are resilient to this weakness. But fundamentally, finding your 
>>> way into the P2P cloud is an act of name resolution to find other 
>>> peers in the cloud, which is most easily done by a name resolution 
>>> server, in this case the tracker. If we're going about making 
>>> changes to DNS, probably the most technically sane (but politically 
>>> unrealistic) solution would be for DNS itself to provide tracking 
>>> capability for the cloud. Again, that is a sweet idea, but changing 
>>> DNS is hard.
>>>
>>> I don't mean to suggest these issues are insurmountable, merely that 
>>> this is the actual Hard Part.
>>
>> This is a quite accurate analogy. Though you are only thinking of one 
>> site per torrent. Something that would
>> make this scheme much more powerful (as it would have more seeders) 
>> is one torrent for multiple sites.
>>
>> You could then use a scheme like the magnetic URI scheme to reference 
>> a particular site (file(s) of the site). You
>> still have the problem of takedown of a particular tracker, but at 
>> least the serving of the link
>> (the .torrent that is replaced by the Magnetic URI) can be done over 
>> any other channel.
>>
>> You are not necessarily worried of DNS censorship as you could 
>> provide the tracker address as an IP address.
>> You are still worried of censorship through other means (IP 
>> blocking), but this can be overcome by having multiple
>> trackers running.
> What you are bordering on suggesting here is setting up a parallel DNS 
> in the form of multi-trackers for "torrent sites" -- which I actually 
> think is a really good idea. I picture something in my head like the 
> browser/extension/plugin resolves URIs of the form 
> "tor://www.mysite.com" by reference into the distributed tracker. The 
> main issue would be distributing the addresses of the trackers and 
> keeping them available -- but that seems to me like a surmountable 
> problem.
>
>>> 2.  In order to support dynamic content there has to be a 
>>> cryptographic distinction between updates coming from the legitimate 
>>> publisher of the site (Alice), and subterfuge coming from Evil Eve. 
>>> Cryptographic authentication might be authenticated by the data 
>>> being shared (e.g. this data is signed by Alice, and everybody knows 
>>> her signature) or by the P2P network itself (e.g. in addition to 
>>> sharing the torrent, the servers also provide a distributed 
>>> authentication service that will only accept updates signed by 
>>> Alice). I am by no means an expert on this subject, so I will 
>>> refrain from talking about it extensively, but I bring it up merely 
>>> because cryptography is non-optional for any dynamic scheme, and I'm 
>>> not aware of any update-able, cryptographically-secured P2P 
>>> torrents. It sounds like maybe they should exist? It also sounds 
>>> Hard, and Google isn't turning anything up.
>>>
>>> Again, I'm not suggesting these issues are insurmountable; in some 
>>> sense Google Docs does all this. But they do it with a pretty 
>>> sophisticated backend that glues many technologies together (I 
>>> guarantee Google has a killer internal name resolution and 
>>> authentication service), and I have no idea how Hard it would be to 
>>> make those parts takedown-resistant (in the sense that there are no 
>>> central servers. There are unquestionably central servers at Google).
>>>
>> This is a very tough problem and such a feature would be a core one 
>> of "multiple sites per torrent" scheme.
>> Though I disagree that there should be an update feature. The best 
>> option is probably to have an append
>> only mechanism so that you are not worried of Evil Eve changing 
>> content that is already there.
>>
>> This would also have a benefit for the resilience of the network 
>> since the protocol does not understand the concept
>> of "modification" (and therefore deletion) of content. Nobody can 
>> force you to remove your content because it is not
>> technically possible.
> From a distributed systems technology standpoint, append-only is an 
> extremely simplifying assumption -- which is a good thing! It makes 
> this technology more realistic to build. In particular, it moves us 
> from the realm of P2P file sharing and into P2P publisher-subscribe 
> networks, which I believe are more evolved with respect to identifying 
> trustworthy updates (no promises, I haven't worked on this in a while.)
>
>>
>>> 3. The real elephant in the room is that modern Web sites are best 
>>> thought of as programs, not files, and program distribution is 
>>> infinitely harder than file distribution. When you visit a Wordpress 
>>> blog, what you appear to receive is an HTML file: but in actuality 
>>> that HTML was streamed by a PHP script running on the server talking 
>>> to a MySQL database. The number of layers in this onion is 
>>> arbitrary; it's anything you could run on a computer. Arbitrary 
>>> code, or I could hook MSPaint up to the thing if I wanted it enough. 
>>> Distributing *and executing* arbitrary code like this is Quite 
>>> Possibly Impossible, and if it is possible it is Very Far Away. At 
>>> any rate, BitTorrent can't do this.
>>>
>>> The word "arbitrary" is important. In specific cases, you can 
>>> certainly find a case-specific resolution. Diaspora is building 
>>> something like a distributed social network "program", and I do 
>>> honestly believe that with a lot of code and hard thinking, you 
>>> could distribute a Wordpress blog's backend on P2P. It would 
>>> probably require a total rewrite of Wordpress such that it wouldn't 
>>> even be the same piece of software, and you would have to solve the 
>>> other Hard Problems above, but I think it is technically possible at 
>>> this time. But in the general case, you might have more luck working 
>>> towards a Singularity and then asking the Machine-Gods for an answer.
>>>
>> Projects such as unhosted (http://unhosted.org/) lead me to believe 
>> that the future of web technologies is in moving
>> more and more logic away from the server and into the hand of the 
>> client.
>> The diffusion of fast computers and the speed of current javascript 
>> interpreting engines have made it possible (and
>> in most cases even desirable) to make clients do what once was only 
>> done by servers.
>>
>> While certain web sites will probably never be moved totally into the 
>> clients hands (I am thinking of google for
>> example), a great part of the web we use today could very easily be 
>> moved into pure client side logic and the
>> server only used for data storage. A blog for example is just a set 
>> of posts (content) that is updated every
>> X time. All the processing of that data (pagination, styling) can 
>> easily be moved into the clients hands.
>>
>> The only limit that I see is that the data would be able to flow in 
>> one direction from the site to the client, though
>> I don't see this as a critical factor to the success of such a scheme.
> The project you linked is interesting and worthy, but it works 
> precisely by virtue of not addressing a general case. What they offer 
> is a specific programming model that from a takedown perspective, is 
> more robust, but from a site design perspective is more restrictive.
>
> By way of example, let me push back on the idea that a Wordpress blog 
> is just post content and render logic. A Wordpress blog is also a 
> source of authentication. This is most obvious when you log in to 
> write posts, install addons, whatever. There are different 
> administrative levels, and in fact there are something like 4-7 crypto 
> keys being juggled in the background.
>
> More subtly (but perhaps more seriously), authentication takes the 
> form of security in the canonicality of server-side code. When I 
> comment on a Wordpress blog, my comment is passed to the server in the 
> form of a text string which goes through considerable sanitization 
> before it shows up as a comment. It's parsed for dangerous SQL 
> strings, usually passed through a swear-checking service, and most 
> importantly it ends up correctly stored in the right way in the right 
> place. I can be confident this is happening correctly because they 
> script that is doing so is running on my server *and has restricted 
> write and execute permissions*. If that script is running on a client 
> and talking to the data store via some protocol, then I have to design 
> that protocol in such a way that the client can't cheat, without 
> necessarily having a central server that at the end of the day will 
> say "yup, no dangerous SQL."
>
> Getting philosophical, the kinds of permissions a remove data store 
> model permits are read and write. You can do a lot with this; you can 
> make sure unauthenticated users can only write comments, for example. 
> But POSIX permissions are a good bit more sophisicated, and in 
> particular you can have readable, un-writeable, executable code, which 
> means you know exactly what code is running and what it should do. 
> This is an advantage a canonical server agent provides. In the case of 
> distributed applications, individual peers can easily be byzantine, 
> which is a badass way of saying they are jerks and they can execute 
> whatever plan they want. You design around this by by designing your 
> distributed P2P protocol very well -- usually on a case by case basis, 
> and more often than not you will eventually come to a point where you 
> say forget it, this is too complicated, let's just have a central 
> server for this one hard part we can't solve and shadow it 3 times. 
> That's robust against random failure, but not intentioned takedown.
>
> The impression I get from Unhosted is that they want to change the 
> person (singular)-to-service relationship such that more power is in 
> the hand of the person to manage their (singular) data. I think this 
> becomes much, much harder when you have people, that is, peers, 
> laboring in the production of a collective illusion, such as a blog 
> *with comments* or any other kind of community participation. It's by 
> no means impossible, it gets done all the time, but for now it gets 
> done on a case-by-case basis. So maybe for now what you can build is 
> an append-only blog without those features, which is essentially just 
> a pub-sub news service, which I think is totally realistic in the now 
> and a good idea.
> - Daniel Margo
>
>>> What seems more realistic to me is taking snapshots of a Web site 
>>> and distributing those as files instead, rather than trying to 
>>> distribute the actual Web site program. That's why we designed 
>>> Mirror As You Link to work that way.
>>>
>>> These are all the hard issues I can think of, as a technical person 
>>> with some distributed systems background. There may be others, since 
>>> in some cases we're really exploring uncharted waters here.
>>> - Daniel Margo
>> Thanks for your good rationalization of the the issue.
>>
>> - Art.
>>
>>
>



More information about the liberationtech mailing list