[Bigbang-dev] provenance and sharing collected archives

Harsh Gupta mail at hargup.in
Wed Aug 23 07:33:05 CEST 2017


We can also use gitlab, it has unlimited free private repositories. See
https://about.gitlab.com/gitlab-com/
Regards
Harsh Gupta


On Wed, 23 Aug 2017, at 03:45 AM, Sebastian Benthall wrote:
> GitHub has private repositories. One could manage permissions through
> that system.> 
> U.S. IRB says public data isn't human subjects data. I suppose it
> would fitting if EU was stricter. But I believe even the GDPR says
> data that's been explicitly made public is fair game.> 
> Another possibility would be versioned cloud storage like an Amazon S3
> bucket. There must be a sweet open source equivalent one could set up?> 
> On Aug 21, 2017 8:43 AM, "Beraldo, Davide" <d.beraldo at uva.nl> wrote:
>> Hi guys,
>> 
>>  first of all, thanks a lot for keeping this on! and apologize for
>>  the very long inactivity on this side; resolution for coming
>>  academic year is to get more involved with programming for the good
>>  (aka not for evil marketing people )>> 
>>  on the issue of public repository: i am myself not an ethic fanatic,
>>  but working with people who are made me a bit more paranoid; plus,
>>  the DATACTIVE project has made some pretty strict ethical
>>  commitments with the funders>>  .
>>  consequently, i think that making the repositories public would be
>>  too much. i nonetheless see the good in having them stored somewhere
>>  and let interested people access them.>>  ---would it be possible to have the data stored, listed, but
>>  accessible only at request?>> 
>>  in the meanwhile i can check with the ethics experts here what they
>>  think about it>> 
>>  cheers!
>> 
>>  Davide
>> 
>>  ________________________________________
>>  From: Bigbang-dev [bigbang-dev-bounces at data-activism.net] on behalf
>>  of Niels ten Oever [niels at article19.org]>>  Sent: Sunday, August 20, 2017 2:33 PM
>>  To: Nick Doty
>>  Cc: bigbang-dev at data-activism.net
>>  Subject: Re: [Bigbang-dev] provenance and sharing collected archives>> 
>>  Github sounds good to me, but Davide might have some comments re:
>>  (research-)ethics?>> 
>>  Cheers,
>> 
>>  Niels
>> 
>> 
>>  On Fri, Aug 18, 2017 at 03:28:46PM -0700, Nick Doty wrote:
>>  > Yeah, separate git repositories sounds like a good way forward. I
>>  > think having the provenance files will make it easier to
>>  > collaborate and see the current status of such a data repository.>>  >
>>  > Niels, is there a particular reason to use separate server space
>>  > for these data repositories? Or should we just make them public
>>  > GitHub repositories? I could potentially see some privacy
>>  > advantage in not making a public mirror of these mailing list
>>  > archives -- in the occasional case where public mailing list
>>  > archive managers remove sensitive messages, our archives wouldn't
>>  > automatically remove them as well -- but I expect that to be
>>  > notably rare for these groups that make a point of public
>>  > archives.>>  >
>>  > > On Aug 16, 2017, at 8:31 AM, Sebastian Benthall
>>  > > <sbenthall at gmail.com> wrote:>>  > >
>>  > > +1 on having data repositories.
>>  > > That's a great idea.
>>  > >
>>  > > Standalone GitHub repositories (not in BigBang but "next to" it)
>>  > > are possible for smaller data sets. Versioning is nice.>>  > >
>>  > > Not sure how to do the bigger ones.
>>  > >
>>  > > On Aug 11, 2017 10:09 AM, "Niels ten Oever" <niels at article19.org
>>  > > <mailto:niels at article19.org>> wrote:>>  > > Hi Nick,
>>  > >
>>  > > I am happy to work on keeping repositories for IETF and ICANN
>>  > > mailinglists. I can also provide server space for the three
>>  > > bodies (W3C, IETF, ICANN), also makes sense because they're
>>  > > connected.>>  > >
>>  > > I am very sorry that the Datactive fork is still (far) behind my
>>  > > personal fork. We do want to organize a hackathon on this, RIPE
>>  > > has shown interest in support this work, so hopefully we can
>>  > > organize something to work on this before the end of the year.>>  > >
>>  > > Cheers,
>>  > >
>>  > > Niels
>>  > >
>>  > >
>>  > > On Tue, Aug 01, 2017 at 04:50:03PM -0700, Nick Doty wrote:
>>  > > > We've touched on this a couple of times before; I think we've
>>  > > > decided not to include collected mailing list archives in the
>>  > > > BigBang repository itself. There are few archives that would
>>  > > > be relevant to all users, and we're trying to write code for
>>  > > > automated collection so that you can download any archive you
>>  > > > need for your own research.>>  > > >
>>  > > > That being said, I wonder if it might be useful to have
>>  > > > separate repositories where interested researchers can share
>>  > > > the archives they've downloaded. I've been downloading mailing
>>  > > > list archives for every active W3C Working Group and Interest
>>  > > > Group, and separately for every active IETF Working Group; it
>>  > > > comes to a lot of data, takes a good deal of time to download
>>  > > > and may require some babysitting of those long-running
>>  > > > processes. Would others be interested in separate repo's with
>>  > > > snapshots of ML archives for those organizations? Or any other
>>  > > > common organizations/lists it might be useful to have snapshot
>>  > > > data for?>>  > > >
>>  > > > To that point, I also think we'll need useful provenance
>>  > > > metadata if we get to the point of sharing archives. When were
>>  > > > these downloaded, what was the specific mailing list, what
>>  > > > software was used to download them, etc. Indeed, I feel like I
>>  > > > should have that functionality just for my individual work in
>>  > > > order to maintain good research practice. I opened
>>  > > > https://github.com/datactive/bigbang/issues/283
>>  > > > <https://github.com/datactive/bigbang/issues/283>
>>  > > > <https://github.com/datactive/bigbang/issues/283
>>  > > > <https://github.com/datactive/bigbang/issues/283>> on that 6
>>  > > > weeks ago, and today I've written code to generate
>>  > > > provenance.yaml files during the mail collection process:
>>  > > > https://github.com/npdoty/bigbang/tree/provenance
>>  > > > <https://github.com/npdoty/bigbang/tree/provenance>
>>  > > > <https://github.com/npdoty/bigbang/tree/provenance
>>  > > > <https://github.com/npdoty/bigbang/tree/provenance>>>>  > > >
>>  > > > I'd appreciate any feedback on the issue or on this list.
>>  > > >
>>  > > > I could try to create a minimal PR, but that's getting harder
>>  > > > for me as datactive/bigbang's master branch has not been
>>  > > > updated in a long time and my code may rely on other changes
>>  > > > I've made in intervening months.>>  > > >
>>  > > > Cheers,
>>  > > > Nick
>>  > >
>>  > >
>>  > >
>>  > > > _______________________________________________
>>  > > > Bigbang-dev mailing list
>>  > > > Bigbang-dev at data-activism.net <mailto:Bigbang-dev at data-
>>  > > > activism.net>>>  > > > https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>>  > > > <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>>>  > >
>>  > >
>>  > > --
>>  > >
>>  > > Niels ten Oever
>>  > > Head of Digital
>>  > >
>>  > > Article 19
>>  > > www.article19.org <http://www.article19.org/>
>>  > >
>>  > > PGP fingerprint    2458 0B70 5C4A FD8A 9488
>>  > >                    643A 0ED8 3F3A 468A C8B3
>>  > >
>>  > >
>>  > > _______________________________________________
>>  > > Bigbang-dev mailing list
>>  > > Bigbang-dev at data-activism.net <mailto:Bigbang-dev at data-
>>  > > activism.net>>>  > > https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>>  > > <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>>>  > >
>>  >
>> 
>> 
>> 
>>  --
>> 
>>  Niels ten Oever
>>  Head of Digital
>> 
>>  Article 19
>> www.article19.org
>> 
>>  PGP fingerprint    2458 0B70 5C4A FD8A 9488
>>                     643A 0ED8 3F3A 468A C8B3
>> 
>> 
>>  _______________________________________________
>>  Bigbang-dev mailing list
>> Bigbang-dev at data-activism.net
>> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
> _________________________________________________
> Bigbang-dev mailing list
> Bigbang-dev at data-activism.net
> https://lists.ghserv.net/mailman/listinfo/bigbang-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20170823/c8de6798/attachment-0001.html>


More information about the Bigbang-dev mailing list