[Bigbang-user] R: R: R: Issue with listserv fetching (3GPP)
Riccardo Nanni
riccardo.nanni9 at unibo.it
Fri Apr 23 11:53:35 CEST 2021
Great!
Thank you again, Christoph and Niels, later I'll try it.
Best,
Riccardo
________________________________
Da: Niels ten Oever <mail at nielstenoever.net>
Inviato: venerdì 23 aprile 2021 11:50
A: Riccardo Nanni <riccardo.nanni9 at unibo.it>; Christoph Becker <chrbecker01 at gmail.com>
Cc: bigbang-user at data-activism.net <bigbang-user at data-activism.net>
Oggetto: Re: R: R: [Bigbang-user] Issue with listserv fetching (3GPP)
Thanks Christoph!
This was the content of the file example.py:
import bigbang
from bigbang import listserv
from bigbang.listserv import ListservArchive, ListservList, ListservMessage
url_archive = "https://list.etsi.org/scripts/wa.exe?"
url_list = url_archive + "A0=3GPP_TSG_CT_WG6"
ListservArchive.from_url(
name="3GPP",
url_root=url_archive,
url_home=url_archive + "HOME",
login=auth_key_mock,
instant_save=True,
only_mlist_urls=False,
)
Best,
Niels
On 23-04-2021 09:25, Riccardo Nanni wrote:
> Dear Niels and Christoph,
>
> thanks a lot for your help!
> I tried Niels' way and I keep getting the 'instant_dump'.
> I did 'git branch' and it shows the following:
>
> *main
> master
>
> I understand I am on the 'main' branch, is it right?
> Then I tried 'git pull' again and it says it is already updated, but it keeps showing the 'instant_dump' message when I try the usual command.
>
> @Christoph: thank you for sharing the file on the alternative way to gather listserv emails, but I don't think it came through: all I can find is an error message that says an attachment was detected as malware (guess my computer 'misread' your file?). Any chance you can share it again, please?
>
> Thanks a lot again, you're all very helpful! As I'm better at cooking than programming, when you come to Italy I owe you a dinner 🙂🙂
> Cheers,
>
> Riccardo
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *Da:* Christoph Becker <chrbecker01 at gmail.com>
> *Inviato:* venerdì 23 aprile 2021 00:23
> *A:* Niels ten Oever <mail at nielstenoever.net>
> *Cc:* Riccardo Nanni <riccardo.nanni9 at unibo.it>; bigbang-user at data-activism.net <bigbang-user at data-activism.net>
> *Oggetto:* Re: R: [Bigbang-user] Issue with listserv fetching (3GPP)
>
> Hi Niels & Riccardo,
> the argument 'instant_dump' for the ListservArchive class object does not exist anymore in the up-to-date 'main' branch of the git repo.
> @Niels: Do you mean that you did a 'git pull' and encountered the TypeError caused by missing 'instant_dump' too?
>
> But as I said in another message, we are not quite there yet for 3GPP and IEEE to use the 'conventional' method on how BigBang scrapes archives such as W3C.
> I attached a small examples that shows how you can currently scrape the 3GPP archive and save it to mbox files in the CONFIG.mail_path folder.
> Be aware that this could take very long and could use a lot of memory.
>
> Best Wishes,
> Christoph
>
>
> Op do 22 apr. 2021 om 17:17 schreef Niels ten Oever <mail at nielstenoever.net <mailto:mail at nielstenoever.net>>:
>
> Hi Riccardo and Christoph,
>
> I see there might be an issue with the usage of special characters in the mailinglist URLs, to get it working I had to put a '\' in front on the '?', but this could also be fixed by using " " around the URL. However, after that fetching did not work either - so let's ask Christoph (cc).
>
> Cheers,
>
> Niels
>
>
>
>
>
>
> On 22-04-2021 17:43, Riccardo Nanni wrote:
> > Hi Niels,
> >
> > thanks for your answer!
> > I did, and I found the changes I can see in Github (e.g. the listserv.3GPP.txt file, etc.).
> > I did it again when I saw it didn't work and it says 'già aggiornato' (already updated).
> >
> > Riccardo
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > *Da:* Bigbang-user <bigbang-user-bounces at data-activism.net <mailto:bigbang-user-bounces at data-activism.net>> per conto di Niels ten Oever <mail at nielstenoever.net <mailto:mail at nielstenoever.net>>
> > *Inviato:* giovedì 22 aprile 2021 17:38
> > *A:* bigbang-user at data-activism.net <mailto:bigbang-user at data-activism.net> <bigbang-user at data-activism.net <mailto:bigbang-user at data-activism.net>>
> > *Oggetto:* Re: [Bigbang-user] Issue with listserv
> >
> > Hi Riccardo,
> >
> > This is not a very informed response - but did you first do:
> >
> > git pull
> >
> > to ensure that you have the latest version with all the recent changes?
> >
> > Best,
> >
> > Niels
> >
> > On 22-04-2021 17:31, Riccardo Nanni wrote:
> >> Dear all,
> >>
> >> how are you?
> >> I tried to collect email from 3GPP by running these commands:
> >> python bin/collect_mail.py -u https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>? <https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?> <https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>? <https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?>>;
> >> python3 bin/collect_mail.py -u https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>? <https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?> <https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>? <https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?>>
> >> AND
> >> python3 bin/collect_mail.py -f examples/url_collections/listserv.3GPP.txt
> >>
> >> Also tried to scrape a specific group's list with the same commands: https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN> <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN>> <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN> <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN>>>
> >>
> >> I get the following error:
> >> TypeError: from_url() got an unexpected keyword argument 'instant_dump'
> >>
> >> I don't understand what I'm missing. Can you help me, please?
> >> Thanks a lot in advance! The only similar argument I could find on Stackoverflow has no answers...
> >>
> >> Riccardo
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bigbang-user mailing list
> >> Bigbang-user at data-activism.net <mailto:Bigbang-user at data-activism.net>
> >> https://lists.ghserv.net/mailman/listinfo/bigbang-user <https://lists.ghserv.net/mailman/listinfo/bigbang-user> <https://lists.ghserv.net/mailman/listinfo/bigbang-user <https://lists.ghserv.net/mailman/listinfo/bigbang-user>>
> >>
> >
> > --
> > Niels ten Oever, PhD
> > Postdoctoral Researcher - Media Studies Department - University of Amsterdam
> > Research Fellow - Centre for Internet and Human Rights - European University Viadrina
> > Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio Vargas
> >
> > https://nielstenoever.net <https://nielstenoever.net> <https://nielstenoever.net <https://nielstenoever.net>> - mail at nielstenoever.net <mailto:mail at nielstenoever.net> - @nielstenoever - +31629051853
> > PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
> >
> > Read my latest article on Internet infrastructure governance in New Media & Society here: https://journals.sagepub.com/doi/full/10.1177/1461444820929320 <https://journals.sagepub.com/doi/full/10.1177/1461444820929320> <https://journals.sagepub.com/doi/full/10.1177/1461444820929320 <https://journals.sagepub.com/doi/full/10.1177/1461444820929320>>
> >
> > _______________________________________________
> > Bigbang-user mailing list
> > Bigbang-user at data-activism.net <mailto:Bigbang-user at data-activism.net>
> > https://lists.ghserv.net/mailman/listinfo/bigbang-user <https://lists.ghserv.net/mailman/listinfo/bigbang-user> <https://lists.ghserv.net/mailman/listinfo/bigbang-user <https://lists.ghserv.net/mailman/listinfo/bigbang-user>>
>
> --
> Niels ten Oever, PhD
> Postdoctoral Researcher - Media Studies Department - University of Amsterdam
> Research Fellow - Centre for Internet and Human Rights - European University Viadrina
> Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio Vargas
>
> https://nielstenoever.net <https://nielstenoever.net> - mail at nielstenoever.net <mailto:mail at nielstenoever.net> - @nielstenoever - +31629051853
> PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
>
> Read my latest article on Internet infrastructure governance in New Media & Society here: https://journals.sagepub.com/doi/full/10.1177/1461444820929320 <https://journals.sagepub.com/doi/full/10.1177/1461444820929320>
>
>
>
> --
> <><><><><><><><><><><><><><><><>
> /Christoph Becker /(/he/him/his/)///
> PhD at the
> /
> /Institute for Data Science and/
> /Institute for Computational Cosmology/
> /Durham University/
> /United Kingdom/
> //christovis.github.io// <http://christovis.github.io>
--
Niels ten Oever, PhD
Postdoctoral Researcher - Media Studies Department - University of Amsterdam
Research Fellow - Centre for Internet and Human Rights - European University Viadrina
Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio Vargas
https://nielstenoever.net - mail at nielstenoever.net - @nielstenoever - +31629051853
PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
Read my latest article on Internet infrastructure governance in New Media & Society here: https://journals.sagepub.com/doi/full/10.1177/1461444820929320
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-user/attachments/20210423/e5451a02/attachment-0001.htm>
More information about the Bigbang-user
mailing list