[liberationtech] Concept for takedown-resistant publishing
Daniel Margo
dmargo at eecs.harvard.edu
Fri Feb 3 12:35:30 PST 2012
After consulting a shower, I realized the way you would do comments is
by storing them in a format that required no pre-processing (e.g. SQL
sanitization) and then doing all post-processing (HTML sanitization,
BBcodes, swear removal, w/e) at the client, where if they're byzantine
all they affect is their self. This is probably extensible to any data
storage-and-retrieval feature where there is A. no pre-processing, and
B. no more post-processing than can be realistically done at render
time. That still expands the universe of possible features a good bit.
- Daniel Margo
On 02/03/2012 02:46 PM, Daniel Margo wrote:
>
> On 02/03/2012 10:26 AM, Arturo Filastò wrote:
>> On Feb 3, 2012, at 2:02 AM, Daniel Margo wrote:
>>> Torrent is an open protocol. There are numerous open source torrent
>>> libraries. Opera can download torrents natively, for Firefox there
>>> are extensions to do this, and so far as I know every browser has
>>> some kind of plug-able framework for MIME type (Internet media type)
>>> handling. To have a browser download and then open a torrent of a
>>> page is totally realistic, and probably not that much code.
>>>
>>> The problem is that this is not the actual technical challenge. The
>>> actual challenges are these:
>>> 1. Name resolution: How do I find the torrent file, and then the P2P
>>> cloud itself?
>>> 2. Updates: How do I share a Web site as a living, updating document?
>>> 3. Server backend: Is it realistic to run a modern Web site without
>>> a client-server relationship?
>>>
>>> 1. Name resolution comes in two parts: finding the torrent file, and
>>> then finding the actual P2P cloud. Web sites like The Pirate Bay
>>> function as name resolution services for finding torrent files, but
>>> obviously any single such server can be blocked. What I presume we
>>> actually desire is a name resolution service just like DNS for Web
>>> sites: you type in a URL, it gets resolved to a torrent file.
>>>
>>> URLs have a hostname and a path, e.g.
>>> "www.hostname.com/path/to/torrent.file". "www.hostname.com" is
>>> resolved by DNS to a server, and then "/path/to/torrent.file" is
>>> resolved by that server. So unfortunately, you don't gain any
>>> takedown resistance by hosting your torrent file at
>>> "www.myserver.com/my.torrent", because if "www.myserver.com" is
>>> taken down, then "/my.torrent" can't be resolved. So what you would
>>> actually need is for DNS to provide hostname resolution directly
>>> into torrents. This is a sweet idea, but changing DNS is hard.
>>>
>>> Even if you could do that, well, what's in a "torrent file"? A
>>> torrent file contains information to get you joined into the P2P
>>> cloud; specifically, it contains the address of a tracker, a central
>>> server that gates entry into the cloud. If the tracker is down or
>>> blocked by some technology, you can't enter the cloud. This is
>>> primarily a weakness of the BitTorrent protocol, and not the idea
>>> itself; there are extensions to BitTorrent and other P2P protocols
>>> that are resilient to this weakness. But fundamentally, finding your
>>> way into the P2P cloud is an act of name resolution to find other
>>> peers in the cloud, which is most easily done by a name resolution
>>> server, in this case the tracker. If we're going about making
>>> changes to DNS, probably the most technically sane (but politically
>>> unrealistic) solution would be for DNS itself to provide tracking
>>> capability for the cloud. Again, that is a sweet idea, but changing
>>> DNS is hard.
>>>
>>> I don't mean to suggest these issues are insurmountable, merely that
>>> this is the actual Hard Part.
>>
>> This is a quite accurate analogy. Though you are only thinking of one
>> site per torrent. Something that would
>> make this scheme much more powerful (as it would have more seeders)
>> is one torrent for multiple sites.
>>
>> You could then use a scheme like the magnetic URI scheme to reference
>> a particular site (file(s) of the site). You
>> still have the problem of takedown of a particular tracker, but at
>> least the serving of the link
>> (the .torrent that is replaced by the Magnetic URI) can be done over
>> any other channel.
>>
>> You are not necessarily worried of DNS censorship as you could
>> provide the tracker address as an IP address.
>> You are still worried of censorship through other means (IP
>> blocking), but this can be overcome by having multiple
>> trackers running.
> What you are bordering on suggesting here is setting up a parallel DNS
> in the form of multi-trackers for "torrent sites" -- which I actually
> think is a really good idea. I picture something in my head like the
> browser/extension/plugin resolves URIs of the form
> "tor://www.mysite.com" by reference into the distributed tracker. The
> main issue would be distributing the addresses of the trackers and
> keeping them available -- but that seems to me like a surmountable
> problem.
>
>>> 2. In order to support dynamic content there has to be a
>>> cryptographic distinction between updates coming from the legitimate
>>> publisher of the site (Alice), and subterfuge coming from Evil Eve.
>>> Cryptographic authentication might be authenticated by the data
>>> being shared (e.g. this data is signed by Alice, and everybody knows
>>> her signature) or by the P2P network itself (e.g. in addition to
>>> sharing the torrent, the servers also provide a distributed
>>> authentication service that will only accept updates signed by
>>> Alice). I am by no means an expert on this subject, so I will
>>> refrain from talking about it extensively, but I bring it up merely
>>> because cryptography is non-optional for any dynamic scheme, and I'm
>>> not aware of any update-able, cryptographically-secured P2P
>>> torrents. It sounds like maybe they should exist? It also sounds
>>> Hard, and Google isn't turning anything up.
>>>
>>> Again, I'm not suggesting these issues are insurmountable; in some
>>> sense Google Docs does all this. But they do it with a pretty
>>> sophisticated backend that glues many technologies together (I
>>> guarantee Google has a killer internal name resolution and
>>> authentication service), and I have no idea how Hard it would be to
>>> make those parts takedown-resistant (in the sense that there are no
>>> central servers. There are unquestionably central servers at Google).
>>>
>> This is a very tough problem and such a feature would be a core one
>> of "multiple sites per torrent" scheme.
>> Though I disagree that there should be an update feature. The best
>> option is probably to have an append
>> only mechanism so that you are not worried of Evil Eve changing
>> content that is already there.
>>
>> This would also have a benefit for the resilience of the network
>> since the protocol does not understand the concept
>> of "modification" (and therefore deletion) of content. Nobody can
>> force you to remove your content because it is not
>> technically possible.
> From a distributed systems technology standpoint, append-only is an
> extremely simplifying assumption -- which is a good thing! It makes
> this technology more realistic to build. In particular, it moves us
> from the realm of P2P file sharing and into P2P publisher-subscribe
> networks, which I believe are more evolved with respect to identifying
> trustworthy updates (no promises, I haven't worked on this in a while.)
>
>>
>>> 3. The real elephant in the room is that modern Web sites are best
>>> thought of as programs, not files, and program distribution is
>>> infinitely harder than file distribution. When you visit a Wordpress
>>> blog, what you appear to receive is an HTML file: but in actuality
>>> that HTML was streamed by a PHP script running on the server talking
>>> to a MySQL database. The number of layers in this onion is
>>> arbitrary; it's anything you could run on a computer. Arbitrary
>>> code, or I could hook MSPaint up to the thing if I wanted it enough.
>>> Distributing *and executing* arbitrary code like this is Quite
>>> Possibly Impossible, and if it is possible it is Very Far Away. At
>>> any rate, BitTorrent can't do this.
>>>
>>> The word "arbitrary" is important. In specific cases, you can
>>> certainly find a case-specific resolution. Diaspora is building
>>> something like a distributed social network "program", and I do
>>> honestly believe that with a lot of code and hard thinking, you
>>> could distribute a Wordpress blog's backend on P2P. It would
>>> probably require a total rewrite of Wordpress such that it wouldn't
>>> even be the same piece of software, and you would have to solve the
>>> other Hard Problems above, but I think it is technically possible at
>>> this time. But in the general case, you might have more luck working
>>> towards a Singularity and then asking the Machine-Gods for an answer.
>>>
>> Projects such as unhosted (http://unhosted.org/) lead me to believe
>> that the future of web technologies is in moving
>> more and more logic away from the server and into the hand of the
>> client.
>> The diffusion of fast computers and the speed of current javascript
>> interpreting engines have made it possible (and
>> in most cases even desirable) to make clients do what once was only
>> done by servers.
>>
>> While certain web sites will probably never be moved totally into the
>> clients hands (I am thinking of google for
>> example), a great part of the web we use today could very easily be
>> moved into pure client side logic and the
>> server only used for data storage. A blog for example is just a set
>> of posts (content) that is updated every
>> X time. All the processing of that data (pagination, styling) can
>> easily be moved into the clients hands.
>>
>> The only limit that I see is that the data would be able to flow in
>> one direction from the site to the client, though
>> I don't see this as a critical factor to the success of such a scheme.
> The project you linked is interesting and worthy, but it works
> precisely by virtue of not addressing a general case. What they offer
> is a specific programming model that from a takedown perspective, is
> more robust, but from a site design perspective is more restrictive.
>
> By way of example, let me push back on the idea that a Wordpress blog
> is just post content and render logic. A Wordpress blog is also a
> source of authentication. This is most obvious when you log in to
> write posts, install addons, whatever. There are different
> administrative levels, and in fact there are something like 4-7 crypto
> keys being juggled in the background.
>
> More subtly (but perhaps more seriously), authentication takes the
> form of security in the canonicality of server-side code. When I
> comment on a Wordpress blog, my comment is passed to the server in the
> form of a text string which goes through considerable sanitization
> before it shows up as a comment. It's parsed for dangerous SQL
> strings, usually passed through a swear-checking service, and most
> importantly it ends up correctly stored in the right way in the right
> place. I can be confident this is happening correctly because they
> script that is doing so is running on my server *and has restricted
> write and execute permissions*. If that script is running on a client
> and talking to the data store via some protocol, then I have to design
> that protocol in such a way that the client can't cheat, without
> necessarily having a central server that at the end of the day will
> say "yup, no dangerous SQL."
>
> Getting philosophical, the kinds of permissions a remove data store
> model permits are read and write. You can do a lot with this; you can
> make sure unauthenticated users can only write comments, for example.
> But POSIX permissions are a good bit more sophisicated, and in
> particular you can have readable, un-writeable, executable code, which
> means you know exactly what code is running and what it should do.
> This is an advantage a canonical server agent provides. In the case of
> distributed applications, individual peers can easily be byzantine,
> which is a badass way of saying they are jerks and they can execute
> whatever plan they want. You design around this by by designing your
> distributed P2P protocol very well -- usually on a case by case basis,
> and more often than not you will eventually come to a point where you
> say forget it, this is too complicated, let's just have a central
> server for this one hard part we can't solve and shadow it 3 times.
> That's robust against random failure, but not intentioned takedown.
>
> The impression I get from Unhosted is that they want to change the
> person (singular)-to-service relationship such that more power is in
> the hand of the person to manage their (singular) data. I think this
> becomes much, much harder when you have people, that is, peers,
> laboring in the production of a collective illusion, such as a blog
> *with comments* or any other kind of community participation. It's by
> no means impossible, it gets done all the time, but for now it gets
> done on a case-by-case basis. So maybe for now what you can build is
> an append-only blog without those features, which is essentially just
> a pub-sub news service, which I think is totally realistic in the now
> and a good idea.
> - Daniel Margo
>
>>> What seems more realistic to me is taking snapshots of a Web site
>>> and distributing those as files instead, rather than trying to
>>> distribute the actual Web site program. That's why we designed
>>> Mirror As You Link to work that way.
>>>
>>> These are all the hard issues I can think of, as a technical person
>>> with some distributed systems background. There may be others, since
>>> in some cases we're really exploring uncharted waters here.
>>> - Daniel Margo
>> Thanks for your good rationalization of the the issue.
>>
>> - Art.
>>
>>
>
More information about the liberationtech
mailing list