[liberationtech] ONI Summarized Global Internet Filtering Data Now Available for Download

Jacob Appelbaum jacob at appelbaum.net
Tue Nov 8 18:37:26 PST 2011


On 11/03/2011 09:02 AM, Masashi Nishihata wrote:
> The OpenNet Initiative is pleased to announce the availability of our
> summarized global Internet filtering data as a downloadable CSV file
> under a Creative Commons license. The data provides an overview of the
> most recent ONI ratings of the breadth and depth of Internet censorship
> in seventy-four countries across four content categories (political,
> social, Internet tools and conflict/security). This release makes ONI
> data more accessible to researchers, journalists, and data mash-up
> developers.
> 

Hi,

I went to look and I was really surprised by this release. I've
previously encouraged ONI to release testing methodology, query or test
data and of course to release result data. I talked about this in my
talk at RECon2011:
https://docs.google.com/open?id=0B3oW5rHYXHvMNjBkMGFmMmItNmFjYy00YjhlLThiMDUtODM0MDA1MWE0MmFi

So why was I surprised? I was surprised because the data released is not
what many have been asking ONI to publish for years.

Here are two example lines from ONI_data-20111102.zip:

BY,Belarus,2,selective,2,selective,2,selective,2,selective,Low,Low,2008,http://opennet.net/research/profiles/belarus

CA,Canada,0,no evidence,0,no evidence,0,no evidence,0,no
evidence,n/a,n/a,2009,http://opennet.net/research/regions/namerica

Belarus has extensive monitoring and filtering.  Canada has "lawful
interception" as well as filtering/censorship by the major ISPs with a
totally secret list.

So... Selective filtering for Belarus and no evidence for Canada? That
seems a bit odd. Is the data released just a big CSV of summaries of the
web pages without any further technical data?

This stuff released has no open data other than summaries; some of those
summaries are simply incorrect from a cursory glance.

Additionally, it lacks an open methodology without enough technical
detail to allow for independent confirmation; it has no actual testing
data or tools for testing, etc.

> The data is available on ourresearch<http://opennet.net/research>page
> (http://opennet.net/sites/opennet.net/files/ONI_data-20111102.zip
> <http://opennet.net/research>), along with adescription of our
> methodology
> (http://opennet.net/sites/opennet.net/files/ONIDatareadme_Nov%202011.pdf)
> 

Where is the actual data? Where are the URLS that ONI testers visit?
Where are the tools? Where is the actual technical methodology? Where is
the rTurtle source code? rTurtle was used, right?

In ONIDatareadme_Nov%202011.pdf I see "The list of URLs is accessed
simultaneously over HTTP both in the country suspected of Internet
filtering and a country with no filtering regime (e.g., Canada)."

That is concerning as Canada does actually filter the internet:
https://en.wikipedia.org/wiki/Cleanfeed_(content_blocking_system)#Canada

Also, how? Do you masquerade as known browsers? Do you send HTTP headers
in the right order? Do you connect back to known hosts? How do you avoid
being fingerprinted by the local network and then treated differently?

In ONIDatareadme_Nov%202011.pdf I also see ""The data gathered from the
country with no filtering is used as a control to compare the data from
the country suspected of filtering. Additional diagnostic work is
performed to separate normal connectivity errors from intentional
tampering."

What is "additional diagnostic work" in this paragraph?

> This release is in part timed for Hack4Transparency
> (http://www.euhackathon.eu/)being held November 8-9, 2011 in Brussels as
> part of ONI support for the event.
> 

Please do attend the lecture by Arturo at Hack4Transparency who is my
co-author on ooni-probe. Our project aims to solve all of these issues
in their entirety - open data, open results, open methodology, Free
Software tools, etc.

> We are excited to see what the community does with this data and invite
> feedback and suggestions for future releases.
> 

Release rTurtle as Free Software and open your URL lists to the
community. Please publish your data results, the risks of using your
tools and so on.

For science,
Jacob



More information about the liberationtech mailing list