[Bigbang-user] R: R: Issue with listserv fetching (3GPP)

Riccardo Nanni riccardo.nanni9 at unibo.it
Fri Apr 23 09:25:15 CEST 2021


Dear Niels and Christoph,

thanks a lot for your help!
I tried Niels' way and I keep getting the 'instant_dump'.
I did 'git branch' and it shows the following:

*main
master

I understand I am on the 'main' branch, is it right?
Then I tried 'git pull' again and it says it is already updated, but it keeps showing the 'instant_dump' message when I try the usual command.

@Christoph: thank you for sharing the file on the alternative way to gather listserv emails, but I don't think it came through: all I can find is an error message that says an attachment was detected as malware (guess my computer 'misread' your file?). Any chance you can share it again, please?

Thanks a lot again, you're all very helpful! As I'm better at cooking than programming, when you come to Italy I owe you a dinner 🙂🙂
Cheers,

Riccardo
________________________________
Da: Christoph Becker <chrbecker01 at gmail.com>
Inviato: venerdì 23 aprile 2021 00:23
A: Niels ten Oever <mail at nielstenoever.net>
Cc: Riccardo Nanni <riccardo.nanni9 at unibo.it>; bigbang-user at data-activism.net <bigbang-user at data-activism.net>
Oggetto: Re: R: [Bigbang-user] Issue with listserv fetching (3GPP)

Hi Niels & Riccardo,
the argument 'instant_dump' for the ListservArchive class object does not exist anymore in the up-to-date 'main' branch of the git repo.
@Niels: Do you mean that you did a 'git pull' and encountered the TypeError caused by missing 'instant_dump' too?

But as I said in another message, we are not quite there yet for 3GPP and IEEE to use the 'conventional' method on how BigBang scrapes archives such as W3C.
I attached a small examples that shows how you can currently scrape the 3GPP archive and save it to mbox files in the CONFIG.mail_path folder.
Be aware that this could take very long and could use a lot of memory.

Best Wishes,
Christoph


Op do 22 apr. 2021 om 17:17 schreef Niels ten Oever <mail at nielstenoever.net<mailto:mail at nielstenoever.net>>:
Hi Riccardo and Christoph,

I see there might be an issue with the usage of special characters in the mailinglist URLs, to get it working I had to put a '\' in front on the '?', but this could also be fixed by using " " around the URL. However, after that fetching did not work either - so let's ask Christoph (cc).

Cheers,

Niels






On 22-04-2021 17:43, Riccardo Nanni wrote:
> Hi Niels,
>
> thanks for your answer!
> I did, and I found the changes I can see in Github (e.g. the listserv.3GPP.txt file, etc.).
> I did it again when I saw it didn't work and it says 'già aggiornato' (already updated).
>
> Riccardo
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *Da:* Bigbang-user <bigbang-user-bounces at data-activism.net<mailto:bigbang-user-bounces at data-activism.net>> per conto di Niels ten Oever <mail at nielstenoever.net<mailto:mail at nielstenoever.net>>
> *Inviato:* giovedì 22 aprile 2021 17:38
> *A:* bigbang-user at data-activism.net<mailto:bigbang-user at data-activism.net> <bigbang-user at data-activism.net<mailto:bigbang-user at data-activism.net>>
> *Oggetto:* Re: [Bigbang-user] Issue with listserv
>
> Hi Riccardo,
>
> This is not a very informed response - but did you first do:
>
> git pull
>
> to ensure that you have the latest version with all the recent changes?
>
> Best,
>
> Niels
>
> On 22-04-2021 17:31, Riccardo Nanni wrote:
>> Dear all,
>>
>> how are you?
>> I tried to collect email from 3GPP by running these commands:
>> python bin/collect_mail.py -u https://list.etsi.org/scripts/wa.exe? <https://list.etsi.org/scripts/wa.exe?> <https://list.etsi.org/scripts/wa.exe? <https://list.etsi.org/scripts/wa.exe?>>;
>> python3 bin/collect_mail.py -u https://list.etsi.org/scripts/wa.exe? <https://list.etsi.org/scripts/wa.exe?> <https://list.etsi.org/scripts/wa.exe? <https://list.etsi.org/scripts/wa.exe?>>
>> AND
>> python3 bin/collect_mail.py -f examples/url_collections/listserv.3GPP.txt
>>
>> Also tried to scrape a specific group's list with the same commands: https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN> <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN>>
>>
>> I get the following error:
>> TypeError: from_url() got an unexpected keyword argument 'instant_dump'
>>
>> I don't understand what I'm missing. Can you help me, please?
>> Thanks a lot in advance! The only similar argument I could find on Stackoverflow has no answers...
>>
>> Riccardo
>>
>>
>>
>>
>> _______________________________________________
>> Bigbang-user mailing list
>> Bigbang-user at data-activism.net<mailto:Bigbang-user at data-activism.net>
>> https://lists.ghserv.net/mailman/listinfo/bigbang-user <https://lists.ghserv.net/mailman/listinfo/bigbang-user>
>>
>
> --
> Niels ten Oever, PhD
> Postdoctoral Researcher - Media Studies Department - University of Amsterdam
> Research Fellow - Centre for Internet and Human Rights - European University Viadrina
> Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio Vargas
>
> https://nielstenoever.net <https://nielstenoever.net> - mail at nielstenoever.net<mailto:mail at nielstenoever.net> - @nielstenoever - +31629051853
> PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
>
> Read my latest article on Internet infrastructure governance in New Media & Society here: https://journals.sagepub.com/doi/full/10.1177/1461444820929320 <https://journals.sagepub.com/doi/full/10.1177/1461444820929320>
>
> _______________________________________________
> Bigbang-user mailing list
> Bigbang-user at data-activism.net<mailto:Bigbang-user at data-activism.net>
> https://lists.ghserv.net/mailman/listinfo/bigbang-user <https://lists.ghserv.net/mailman/listinfo/bigbang-user>

--
Niels ten Oever, PhD
Postdoctoral Researcher - Media Studies Department - University of Amsterdam
Research Fellow - Centre for Internet and Human Rights - European University Viadrina
Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio Vargas

https://nielstenoever.net - mail at nielstenoever.net<mailto:mail at nielstenoever.net> - @nielstenoever - +31629051853
PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3

Read my latest article on Internet infrastructure governance in New Media & Society here: https://journals.sagepub.com/doi/full/10.1177/1461444820929320


--
<><><><><><><><><><><><><><><><>
Christoph Becker (he/him/his)
PhD at the
Institute for Data Science and
Institute for Computational Cosmology
Durham University
United Kingdom
christovis.github.io<http://christovis.github.io>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-user/attachments/20210423/0af386d7/attachment-0001.htm>


More information about the Bigbang-user mailing list