[Bigbang-dev] BigBang discrepancy between quantitate and qualitative findings - explanations?

Sebastian Benthall sbenthall at gmail.com
Tue Jul 31 00:28:17 CEST 2018


Hi Corinne,

Thanks so much for this. Apologies for the late response; I've been
traveling.

A note on process:
To me, these issues look like Bugs (if not in the software, then at least
in the documentation).
A good way to report these very technically specific issues is to file a
GitHub Issue for each one:
https://github.com/datactive/bigbang/issues

That way we can easily communicate about each one, with reference to
specific lines of code, and assign the task of fixing them.

As is (and this isn't anybody's fault) I'm personally having a hard time
following the discussion because I gather it spans over multiple email
threads.

I propose that you, Davide, or Niels make these tickets. If there's any
particular problem that you think could use a fresh set of eyes, feel free
to assign it to me and I can take a look.

Best regards,
Seb

On Tue, Jul 24, 2018 at 11:57 AM Corinne Cath <corinnecath at gmail.com> wrote:

> Dear all,
>
> I trust this email finds you well.
>
> I had some questions, after having spent some more time with the
> bigbang notebooks, specifically the basic list statistics_Corinne's
> questions.
>
> *Question 1:*
> When entering in the following time frame:
>
> date_from = pd.datetime(2014,10,1,tzinfo=pytz.utc)
> date_to = pd.datetime(2015,11,30,tzinfo=pytz.utc)
>
> I get conflicting answers in "#Q3: Number of emails in a time frame" and
> "#then you can specify some years and have the break down per month".
>
> [Q3]: 291
> [some years per month]: 281 (11 for year 2014 + (month 1 until 11 of year
> 2015))
>
>
> 2014:  tot 11
>     1:   0
>     2:   0
>     3:   0
>     4:   0
>     5:   0
>     6:   0
>     7:   0
>     8:   0
>     9:   0
>     10:   4
>     11:   6
>     12:   1
> ____________________
> 2015:  tot 297
>     1:   15
>     2:   12
>     3:   14
>     4:   8
>     5:   46
>     6:   11
>     7:   37
>     8:   11
>     9:   11
>     10:   61
>     11:   44
>     12:   13
>
>
> And if I calculate backwards, so for the period I am interested in (october 2014 until and including november 2015): 11 + (297 - 13) = 295
>
> which is again different from the answers I got above.
>
>
> What explains these discrepancies and which one is the authoritative answer? (which might just be that my math skills suck, very possible)
>
>
>
> *Question 2:*
> #Q5 get threads with most replies
>
> It would be interesting to have that also follow the timeline set in #here
> you can set the time frame, which currently it does not do.
> Which I can see because the thread it credits for being the highest, is
> out of sync with my qualitative analysis for the time period (2014, 2015).
>
> So for instance, what is the highest number of threads for
>
> date_from = pd.datetime(2014,10,1,tzinfo=pytz.utc)
> date_to = pd.datetime(2015,11,30,tzinfo=pytz.utc)
>
> Or for
>
> date_from = pd.datetime(2015,12,1,tzinfo=pytz.utc)
> date_to = pd.datetime(2016,11,30,tzinfo=pytz.utc)
>
> etc.
>
>
>
> *Question 3:*
> In #threads with most replies: I get the following results
>
> [hrpc] Examining existing Venue Selection criteria   71
> [hrpc] Case three: DDoS   55
> [hrpc] Human Rights Research Group Call on draft-irtf-hrpc-research-07   53
> Re: [hrpc] draft-tenoever-hrpc-research-02   32
> [hrpc] Comments about draft-irtf-hrpc-research-07   26
>
>
> However, these counts don't hold up against my qualitative count (which
> was done by hand) and hold that there are 34 responses to "[hrpc] Examining
> existing venue selection criteria" and it also doesn't sync with the IETF
> mailing list archive:
> https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=examining+venue
> which says there are 35 responses to this thread.
>
> Similarly, for
>
> [hrpc] Human Rights Research Group Call on draft-irtf-hrpc-research-07 My hand counted notes say there are 53 responses, bigbang has 53, but the archive has 56 see:https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=Human+Rights+Research+Group+Call+on+draft-irtf-hrpc-research-07+
>
>
> I am perfectly comfortable to assume that my hand-count is off by a bit,
> but the discrepancy between the ietf archive and bigbang is odd.
>
> Especially because for instance for hrpc Case three: DDOS, my notes, the
> archive and bigbang sync up perfectly to 55 reponses. See:
> https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=%5Bhrpc%5D+Case+three%3A+DDoS
>
>
> I am sure this has something to do with how the bigbang tool counts versus
> how the ietf counts versus how I count, but this does raise questions again
> about what the authoritative answer is.
>
> Happy to think along! best,
>
>
>
>
>
> --
> Corinne Cath
> Ph.D. Candidate, Oxford Internet Institute & Alan Turing Institute
>
> Web: www.oii.ox.ac.uk/people/corinne-cath
> Email: ccath at turing.ac.uk & corinnecath at gmail.com
> Twitter: @C_Cath
> _______________________________________________
> Bigbang-dev mailing list
> Bigbang-dev at data-activism.net
> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20180730/ce6709e2/attachment.html>


More information about the Bigbang-dev mailing list