[Bigbang-user] R: R: Issue with listserv fetching (3GPP)
Christoph Becker
chrbecker01 at gmail.com
Fri Apr 23 12:35:04 CEST 2021
Hi Riccardo,
just realised that I forgot to declare what "auth_key_mock" is.
You can set up an AuthSession on the 3GPP Listserv archive after createing
<https://list.etsi.org/scripts/wa.exe?GETPW1> an account there and input
your credentials into the function as shown below.
---------------------------------------------------------------------------------------------------
import bigbang
from bigbang import listserv
from bigbang.listserv import ListservArchive, ListservList, ListservMessage
url_archive = "https://list.etsi.org/scripts/wa.exe?"
url_list = url_archive + "A0=3GPP_TSG_CT_WG6"
auth_key_mock = {"username": "your_usrname", "password": "your_psw"}
ListservArchive.from_url(
name="3GPP",
url_root=url_archive,
url_home=url_archive + "HOME",
login=auth_key_mock,
instant_save=True,
only_mlist_urls=False,
)
---------------------------------------------------------------------------------------------------
Please excuse this awkward way of explaining it.
I will try to update the wiki on the git repo asap.
Best Wishes,
Christoph
Op vr 23 apr. 2021 om 10:53 schreef Riccardo Nanni <riccardo.nanni9 at unibo.it
>:
> Great!
>
> Thank you again, Christoph and Niels, later I'll try it.
> Best,
>
> Riccardo
> ------------------------------
> *Da:* Niels ten Oever <mail at nielstenoever.net>
> *Inviato:* venerdì 23 aprile 2021 11:50
> *A:* Riccardo Nanni <riccardo.nanni9 at unibo.it>; Christoph Becker <
> chrbecker01 at gmail.com>
> *Cc:* bigbang-user at data-activism.net <bigbang-user at data-activism.net>
> *Oggetto:* Re: R: R: [Bigbang-user] Issue with listserv fetching (3GPP)
>
> Thanks Christoph!
>
> This was the content of the file example.py:
>
> import bigbang
> from bigbang import listserv
> from bigbang.listserv import ListservArchive, ListservList, ListservMessage
>
> url_archive = "https://list.etsi.org/scripts/wa.exe?"
> url_list = url_archive + "A0=3GPP_TSG_CT_WG6"
>
> ListservArchive.from_url(
> name="3GPP",
> url_root=url_archive,
> url_home=url_archive + "HOME",
> login=auth_key_mock,
> instant_save=True,
> only_mlist_urls=False,
> )
>
>
> Best,
>
> Niels
>
> On 23-04-2021 09:25, Riccardo Nanni wrote:
> > Dear Niels and Christoph,
> >
> > thanks a lot for your help!
> > I tried Niels' way and I keep getting the 'instant_dump'.
> > I did 'git branch' and it shows the following:
> >
> > *main
> > master
> >
> > I understand I am on the 'main' branch, is it right?
> > Then I tried 'git pull' again and it says it is already updated, but it
> keeps showing the 'instant_dump' message when I try the usual command.
> >
> > @Christoph: thank you for sharing the file on the alternative way to
> gather listserv emails, but I don't think it came through: all I can find
> is an error message that says an attachment was detected as malware (guess
> my computer 'misread' your file?). Any chance you can share it again,
> please?
> >
> > Thanks a lot again, you're all very helpful! As I'm better at cooking
> than programming, when you come to Italy I owe you a dinner 🙂🙂
> > Cheers,
> >
> > Riccardo
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > *Da:* Christoph Becker <chrbecker01 at gmail.com>
> > *Inviato:* venerdì 23 aprile 2021 00:23
> > *A:* Niels ten Oever <mail at nielstenoever.net>
> > *Cc:* Riccardo Nanni <riccardo.nanni9 at unibo.it>;
> bigbang-user at data-activism.net <bigbang-user at data-activism.net>
> > *Oggetto:* Re: R: [Bigbang-user] Issue with listserv fetching (3GPP)
> >
> > Hi Niels & Riccardo,
> > the argument 'instant_dump' for the ListservArchive class object does
> not exist anymore in the up-to-date 'main' branch of the git repo.
> > @Niels: Do you mean that you did a 'git pull' and encountered the
> TypeError caused by missing 'instant_dump' too?
> >
> > But as I said in another message, we are not quite there yet for 3GPP
> and IEEE to use the 'conventional' method on how BigBang scrapes archives
> such as W3C.
> > I attached a small examples that shows how you can currently scrape the
> 3GPP archive and save it to mbox files in the CONFIG.mail_path folder.
> > Be aware that this could take very long and could use a lot of memory.
> >
> > Best Wishes,
> > Christoph
> >
> >
> > Op do 22 apr. 2021 om 17:17 schreef Niels ten Oever <
> mail at nielstenoever.net <mailto:mail at nielstenoever.net
> <mail at nielstenoever.net>>>:
> >
> > Hi Riccardo and Christoph,
> >
> > I see there might be an issue with the usage of special characters
> in the mailinglist URLs, to get it working I had to put a '\' in front on
> the '?', but this could also be fixed by using " " around the URL. However,
> after that fetching did not work either - so let's ask Christoph (cc).
> >
> > Cheers,
> >
> > Niels
> >
> >
> >
> >
> >
> >
> > On 22-04-2021 17:43, Riccardo Nanni wrote:
> > > Hi Niels,
> > >
> > > thanks for your answer!
> > > I did, and I found the changes I can see in Github (e.g. the
> listserv.3GPP.txt file, etc.).
> > > I did it again when I saw it didn't work and it says 'già
> aggiornato' (already updated).
> > >
> > > Riccardo
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > *Da:* Bigbang-user <bigbang-user-bounces at data-activism.net <
> mailto:bigbang-user-bounces at data-activism.net
> <bigbang-user-bounces at data-activism.net>>> per conto di Niels ten Oever <
> mail at nielstenoever.net <mailto:mail at nielstenoever.net
> <mail at nielstenoever.net>>>
> > > *Inviato:* giovedì 22 aprile 2021 17:38
> > > *A:* bigbang-user at data-activism.net <
> mailto:bigbang-user at data-activism.net <bigbang-user at data-activism.net>> <
> bigbang-user at data-activism.net <mailto:bigbang-user at data-activism.net
> <bigbang-user at data-activism.net>>>
> > > *Oggetto:* Re: [Bigbang-user] Issue with listserv
> > >
> > > Hi Riccardo,
> > >
> > > This is not a very informed response - but did you first do:
> > >
> > > git pull
> > >
> > > to ensure that you have the latest version with all the recent
> changes?
> > >
> > > Best,
> > >
> > > Niels
> > >
> > > On 22-04-2021 17:31, Riccardo Nanni wrote:
> > >> Dear all,
> > >>
> > >> how are you?
> > >> I tried to collect email from 3GPP by running these commands:
> > >> python bin/collect_mail.py -u
> https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?
> <https://list.etsi.org/scripts/wa.exe <
> https://list.etsi.org/scripts/wa.exe>?> <
> https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?
> <https://list.etsi.org/scripts/wa.exe <
> https://list.etsi.org/scripts/wa.exe>?>>;
> > >> python3 bin/collect_mail.py -u
> https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?
> <https://list.etsi.org/scripts/wa.exe <
> https://list.etsi.org/scripts/wa.exe>?> <
> https://list.etsi.org/scripts/wa.exe <https://list.etsi.org/scripts/wa.exe>?
> <https://list.etsi.org/scripts/wa.exe <
> https://list.etsi.org/scripts/wa.exe>?>>
> > >> AND
> > >> python3 bin/collect_mail.py -f
> examples/url_collections/listserv.3GPP.txt
> > >>
> > >> Also tried to scrape a specific group's list with the same
> commands: https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <
> https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN> <
> https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <
> https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN>> <
> https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <
> https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN> <
> https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN <
> https://list.etsi.org/scripts/wa.exe?A0=3GPP_TSG_RAN>>>
> > >>
> > >> I get the following error:
> > >> TypeError: from_url() got an unexpected keyword argument
> 'instant_dump'
> > >>
> > >> I don't understand what I'm missing. Can you help me, please?
> > >> Thanks a lot in advance! The only similar argument I could find
> on Stackoverflow has no answers...
> > >>
> > >> Riccardo
> > >>
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Bigbang-user mailing list
> > >> Bigbang-user at data-activism.net <
> mailto:Bigbang-user at data-activism.net <Bigbang-user at data-activism.net>>
> > >> https://lists.ghserv.net/mailman/listinfo/bigbang-user <
> https://lists.ghserv.net/mailman/listinfo/bigbang-user> <
> https://lists.ghserv.net/mailman/listinfo/bigbang-user <
> https://lists.ghserv.net/mailman/listinfo/bigbang-user>>
> > >>
> > >
> > > --
> > > Niels ten Oever, PhD
> > > Postdoctoral Researcher - Media Studies Department - University of
> Amsterdam
> > > Research Fellow - Centre for Internet and Human Rights - European
> University Viadrina
> > > Associated Scholar - Centro de Tecnologia e Sociedade - Fundação
> Getúlio Vargas
> > >
> > > https://nielstenoever.net <https://nielstenoever.net> <
> https://nielstenoever.net <https://nielstenoever.net>> -
> mail at nielstenoever.net <mailto:mail at nielstenoever.net
> <mail at nielstenoever.net>> - @nielstenoever - +31629051853
> > > PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
> > >
> > > Read my latest article on Internet infrastructure governance in
> New Media & Society here:
> https://journals.sagepub.com/doi/full/10.1177/1461444820929320 <
> https://journals.sagepub.com/doi/full/10.1177/1461444820929320> <
> https://journals.sagepub.com/doi/full/10.1177/1461444820929320 <
> https://journals.sagepub.com/doi/full/10.1177/1461444820929320>>
> > >
> > > _______________________________________________
> > > Bigbang-user mailing list
> > > Bigbang-user at data-activism.net <
> mailto:Bigbang-user at data-activism.net <Bigbang-user at data-activism.net>>
> > > https://lists.ghserv.net/mailman/listinfo/bigbang-user <
> https://lists.ghserv.net/mailman/listinfo/bigbang-user> <
> https://lists.ghserv.net/mailman/listinfo/bigbang-user <
> https://lists.ghserv.net/mailman/listinfo/bigbang-user>>
> >
> > --
> > Niels ten Oever, PhD
> > Postdoctoral Researcher - Media Studies Department - University of
> Amsterdam
> > Research Fellow - Centre for Internet and Human Rights - European
> University Viadrina
> > Associated Scholar - Centro de Tecnologia e Sociedade - Fundação
> Getúlio Vargas
> >
> > https://nielstenoever.net <https://nielstenoever.net> -
> mail at nielstenoever.net <mailto:mail at nielstenoever.net
> <mail at nielstenoever.net>> - @nielstenoever - +31629051853
> > PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
> >
> > Read my latest article on Internet infrastructure governance in New
> Media & Society here:
> https://journals.sagepub.com/doi/full/10.1177/1461444820929320 <
> https://journals.sagepub.com/doi/full/10.1177/1461444820929320>
> >
> >
> >
> > --
> > <><><><><><><><><><><><><><><><>
> > /Christoph Becker /(/he/him/his/)///
> > PhD at the
> > /
> > /Institute for Data Science and/
> > /Institute for Computational Cosmology/
> > /Durham University/
> > /United Kingdom/
> > //christovis.github.io// <http://christovis.github.io>
>
> --
> Niels ten Oever, PhD
> Postdoctoral Researcher - Media Studies Department - University of
> Amsterdam
> Research Fellow - Centre for Internet and Human Rights - European
> University Viadrina
> Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio
> Vargas
>
> https://nielstenoever.net - mail at nielstenoever.net - @nielstenoever -
> +31629051853
> PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
>
> Read my latest article on Internet infrastructure governance in New Media
> & Society here:
> https://journals.sagepub.com/doi/full/10.1177/1461444820929320
>
--
<><><><><><><><><><><><><><><><>
*Christoph Becker (he/him/his)*
*PhD at the*
*Institute for Data Science and*
*Institute for Computational Cosmology*
*Durham University*
*United Kingdom*
*christovis.github.io* <http://christovis.github.io>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-user/attachments/20210423/5cf1ed5b/attachment-0001.htm>
More information about the Bigbang-user
mailing list