[liberationtech] WebRTC for in-browser P2P search engines

Fri Jan 24 09:55:45 PST 2014

On Sun, Jan 19, 2014 at 11:53:07AM -0800, Jesse Taylor wrote:
> Centralized corporate search engines such as Google are implicated in
> all three of these, i.e. they:
> 
>   * Monitor what we are searching for
>   * Censor websites by removing them from search engine indexes
>   * Shape traffic via non-transparent algorithms that can sort search
>     results in a way that grants prominence to certain types of sites
>     (corporate media, etc.), in order to suit the interests of
>     multinational corporations and governments.

Indeed, that is no small thing.

> engines. The few existing solutions like YaCy all seem overly complex
> (and thus unusable to most users) and require downloading a standalone
> application to use. These standalone P2P search applications don't
> really make sense from a usability perspective. It's unrealistic to

Well, that depends on your platform. For users of free operating systems
it is the most natural thing to click on a software package name and
description and seconds later you have it - including cryptographic proof
of delivery. For users of smartphones it's even easier, although you have
less of a choice who you are trusting. Only retrohistoric computing
platforms such as Windoze or MacOS need you to install a free software
distribution app first. Do we even have a decent one for Windoze? Last
time I checked there were five or more and none of them did integrity
checks after downloading.

So it is rather an odd constellation that makes browser add-ons more
attractive on certain platforms, and it is hard to tell how much longer
that constellation persists.

> It seems to me that it would make more sense to use protocols like
> WebRTC <http://www.webrtc.org/> to facilitate P2P connectivity in the
> web browser, so that the searching and indexing can be done via a simple
> browser plugin that can be installed by anyone with one-click. This

WebRTC may allow for some P2P data exchange, but how do you find out
who you need to connect to in order to get information? Usually that
is achieved by means of a DHT and the web browsers don't have such a
thing. So the question is if you can integrate an add-on with a DHT
and who runs the DHT. Implementing a DHT over WebRTC sounds scary to
me (at the least) and would probably be very annoying performance-wise.
Concerning the choice of DHT I must say what I always say, have a look
at GNUnet's DHT as it has a very nifty distributed and privacy-preserving
look-up mechanism. Christian explains it in last summer's presentation
(see the video on http://youbroketheinternet.org).

> would simplify indexing (e.g. just use the bookmarks/recent sites
> visited by default, rather than forcing users manually configure it),

That would result in a strong bias towards popular things, but.. huh..
maybe you want that. OTOH if you first check the DHT whether something
has already been indexed, then you slowly crawl deeper - and you get
a popularity ranking as a side effect. You may want to talk to devs of
search engines like ixquick or DuckDuckGo (caution: PRISM prone!)

> and would allow people to just use the browser search bar as usual.

A native app can integrate into browsers just the same. It's probably
less cross-porting effort than having to port the entire add-on.

> There would still likely need to be some sort of standalone
> signaling/tracker servers set up to bootstrap search/index nodes into
> the P2P network, but most of the work -- i.e. all of the indexing,
> searching, routing, etc. -- would be done by the nodes using the browser

The way Tor and I2P are evolving suggest that an architecture based on
dumb servers hosting distributed data and computation is probably going
to be more successful. People do not expect to have their computers
slowing down just so they can search the Internet. I would therefore
opt for an app like YaCy being deployed on dumb servers with high
bandwidth and procesing power, maybe use GNUnet for the look-up and
again a lightweight end-user GNUnet node on each person's computer to
be able to take advantage of the privacy preserving look-up capabilities.
Since GNUnet intends to be integrated into web browsers anyway, it's no
big deal to also have its search engine available there. But maybe the
YaCy folks are working on something like this already.

> extension. And almost all of the complexity would be hidden from the
> average user. If P2P search could be simplified in this manner, I feel
> that the adoption would be much more rapid than if P2P search is based
> on complex standalone apps.

You just described a pretty complex and heavy browser add-on that would
make the browser even more heavyweight than it already is... I understand
that having a large Java app that does all the database, indexing and DHT
on each PC is indeed a heavy thing as well - but reimplementing the same
things in Javascript for the browser won't feel much better. The weight
is not in the way the app is served, but in what it does! That's why for
the near future I only see dumb servers as a viable architecture. And by
dumb I mean they really really must not know a thing about what they are
doing for whom. This can be achieved with a smart use of cryptography and
a good choice of backend technologies.

-- 
	    http://youbroketheinternet.org
 ircs://psyced.org/youbroketheinternet