[Bigbang-dev] provenance and sharing collected archives
Harsh Gupta
mail at hargup.in
Wed Aug 23 07:33:05 CEST 2017
We can also use gitlab, it has unlimited free private repositories. See
https://about.gitlab.com/gitlab-com/
Regards
Harsh Gupta
On Wed, 23 Aug 2017, at 03:45 AM, Sebastian Benthall wrote:
> GitHub has private repositories. One could manage permissions through
> that system.>
> U.S. IRB says public data isn't human subjects data. I suppose it
> would fitting if EU was stricter. But I believe even the GDPR says
> data that's been explicitly made public is fair game.>
> Another possibility would be versioned cloud storage like an Amazon S3
> bucket. There must be a sweet open source equivalent one could set up?>
> On Aug 21, 2017 8:43 AM, "Beraldo, Davide" <d.beraldo at uva.nl> wrote:
>> Hi guys,
>>
>> first of all, thanks a lot for keeping this on! and apologize for
>> the very long inactivity on this side; resolution for coming
>> academic year is to get more involved with programming for the good
>> (aka not for evil marketing people )>>
>> on the issue of public repository: i am myself not an ethic fanatic,
>> but working with people who are made me a bit more paranoid; plus,
>> the DATACTIVE project has made some pretty strict ethical
>> commitments with the funders>> .
>> consequently, i think that making the repositories public would be
>> too much. i nonetheless see the good in having them stored somewhere
>> and let interested people access them.>> ---would it be possible to have the data stored, listed, but
>> accessible only at request?>>
>> in the meanwhile i can check with the ethics experts here what they
>> think about it>>
>> cheers!
>>
>> Davide
>>
>> ________________________________________
>> From: Bigbang-dev [bigbang-dev-bounces at data-activism.net] on behalf
>> of Niels ten Oever [niels at article19.org]>> Sent: Sunday, August 20, 2017 2:33 PM
>> To: Nick Doty
>> Cc: bigbang-dev at data-activism.net
>> Subject: Re: [Bigbang-dev] provenance and sharing collected archives>>
>> Github sounds good to me, but Davide might have some comments re:
>> (research-)ethics?>>
>> Cheers,
>>
>> Niels
>>
>>
>> On Fri, Aug 18, 2017 at 03:28:46PM -0700, Nick Doty wrote:
>> > Yeah, separate git repositories sounds like a good way forward. I
>> > think having the provenance files will make it easier to
>> > collaborate and see the current status of such a data repository.>> >
>> > Niels, is there a particular reason to use separate server space
>> > for these data repositories? Or should we just make them public
>> > GitHub repositories? I could potentially see some privacy
>> > advantage in not making a public mirror of these mailing list
>> > archives -- in the occasional case where public mailing list
>> > archive managers remove sensitive messages, our archives wouldn't
>> > automatically remove them as well -- but I expect that to be
>> > notably rare for these groups that make a point of public
>> > archives.>> >
>> > > On Aug 16, 2017, at 8:31 AM, Sebastian Benthall
>> > > <sbenthall at gmail.com> wrote:>> > >
>> > > +1 on having data repositories.
>> > > That's a great idea.
>> > >
>> > > Standalone GitHub repositories (not in BigBang but "next to" it)
>> > > are possible for smaller data sets. Versioning is nice.>> > >
>> > > Not sure how to do the bigger ones.
>> > >
>> > > On Aug 11, 2017 10:09 AM, "Niels ten Oever" <niels at article19.org
>> > > <mailto:niels at article19.org>> wrote:>> > > Hi Nick,
>> > >
>> > > I am happy to work on keeping repositories for IETF and ICANN
>> > > mailinglists. I can also provide server space for the three
>> > > bodies (W3C, IETF, ICANN), also makes sense because they're
>> > > connected.>> > >
>> > > I am very sorry that the Datactive fork is still (far) behind my
>> > > personal fork. We do want to organize a hackathon on this, RIPE
>> > > has shown interest in support this work, so hopefully we can
>> > > organize something to work on this before the end of the year.>> > >
>> > > Cheers,
>> > >
>> > > Niels
>> > >
>> > >
>> > > On Tue, Aug 01, 2017 at 04:50:03PM -0700, Nick Doty wrote:
>> > > > We've touched on this a couple of times before; I think we've
>> > > > decided not to include collected mailing list archives in the
>> > > > BigBang repository itself. There are few archives that would
>> > > > be relevant to all users, and we're trying to write code for
>> > > > automated collection so that you can download any archive you
>> > > > need for your own research.>> > > >
>> > > > That being said, I wonder if it might be useful to have
>> > > > separate repositories where interested researchers can share
>> > > > the archives they've downloaded. I've been downloading mailing
>> > > > list archives for every active W3C Working Group and Interest
>> > > > Group, and separately for every active IETF Working Group; it
>> > > > comes to a lot of data, takes a good deal of time to download
>> > > > and may require some babysitting of those long-running
>> > > > processes. Would others be interested in separate repo's with
>> > > > snapshots of ML archives for those organizations? Or any other
>> > > > common organizations/lists it might be useful to have snapshot
>> > > > data for?>> > > >
>> > > > To that point, I also think we'll need useful provenance
>> > > > metadata if we get to the point of sharing archives. When were
>> > > > these downloaded, what was the specific mailing list, what
>> > > > software was used to download them, etc. Indeed, I feel like I
>> > > > should have that functionality just for my individual work in
>> > > > order to maintain good research practice. I opened
>> > > > https://github.com/datactive/bigbang/issues/283
>> > > > <https://github.com/datactive/bigbang/issues/283>
>> > > > <https://github.com/datactive/bigbang/issues/283
>> > > > <https://github.com/datactive/bigbang/issues/283>> on that 6
>> > > > weeks ago, and today I've written code to generate
>> > > > provenance.yaml files during the mail collection process:
>> > > > https://github.com/npdoty/bigbang/tree/provenance
>> > > > <https://github.com/npdoty/bigbang/tree/provenance>
>> > > > <https://github.com/npdoty/bigbang/tree/provenance
>> > > > <https://github.com/npdoty/bigbang/tree/provenance>>>> > > >
>> > > > I'd appreciate any feedback on the issue or on this list.
>> > > >
>> > > > I could try to create a minimal PR, but that's getting harder
>> > > > for me as datactive/bigbang's master branch has not been
>> > > > updated in a long time and my code may rely on other changes
>> > > > I've made in intervening months.>> > > >
>> > > > Cheers,
>> > > > Nick
>> > >
>> > >
>> > >
>> > > > _______________________________________________
>> > > > Bigbang-dev mailing list
>> > > > Bigbang-dev at data-activism.net <mailto:Bigbang-dev at data-
>> > > > activism.net>>> > > > https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>> > > > <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>>> > >
>> > >
>> > > --
>> > >
>> > > Niels ten Oever
>> > > Head of Digital
>> > >
>> > > Article 19
>> > > www.article19.org <http://www.article19.org/>
>> > >
>> > > PGP fingerprint 2458 0B70 5C4A FD8A 9488
>> > > 643A 0ED8 3F3A 468A C8B3
>> > >
>> > >
>> > > _______________________________________________
>> > > Bigbang-dev mailing list
>> > > Bigbang-dev at data-activism.net <mailto:Bigbang-dev at data-
>> > > activism.net>>> > > https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>> > > <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>>> > >
>> >
>>
>>
>>
>> --
>>
>> Niels ten Oever
>> Head of Digital
>>
>> Article 19
>> www.article19.org
>>
>> PGP fingerprint 2458 0B70 5C4A FD8A 9488
>> 643A 0ED8 3F3A 468A C8B3
>>
>>
>> _______________________________________________
>> Bigbang-dev mailing list
>> Bigbang-dev at data-activism.net
>> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
> _________________________________________________
> Bigbang-dev mailing list
> Bigbang-dev at data-activism.net
> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20170823/c8de6798/attachment-0001.html>
More information about the Bigbang-dev
mailing list