<div dir="ltr">Hi Corinne,<div><br></div><div>Thanks so much for this. Apologies for the late response; I've been traveling.</div><div><br></div><div>A note on process: </div><div>To me, these issues look like Bugs (if not in the software, then at least in the documentation). </div><div>A good way to report these very technically specific issues is to file a GitHub Issue for each one:</div><div><a href="https://github.com/datactive/bigbang/issues">https://github.com/datactive/bigbang/issues</a><br></div><div><br></div><div>That way we can easily communicate about each one, with reference to specific lines of code, and assign the task of fixing them. </div><div><br></div><div>As is (and this isn't anybody's fault) I'm personally having a hard time following the discussion because I gather it spans over multiple email threads.</div><div><br></div><div>I propose that you, Davide, or Niels make these tickets. If there's any particular problem that you think could use a fresh set of eyes, feel free to assign it to me and I can take a look.</div><div><br></div><div>Best regards,</div><div>Seb</div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Jul 24, 2018 at 11:57 AM Corinne Cath <<a href="mailto:corinnecath@gmail.com">corinnecath@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default"><font face="verdana, sans-serif">Dear all,</font></div><div class="gmail_default"><font face="verdana, sans-serif"><br></font></div><div class="gmail_default"><font face="verdana, sans-serif">I trust this email finds you well.</font></div><div class="gmail_default"><font face="verdana, sans-serif"><br></font></div><div class="gmail_default"><font face="verdana, sans-serif">I had some questions, after having spent some more time with the bigbang notebooks, specifically the basic list statistics_Corinne's questions.</font></div><div class="gmail_default"><font face="verdana, sans-serif"><br></font></div><div class="gmail_default"><div class="gmail_default" style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif"><u>Question 1:</u> </font></div><div class="gmail_default" style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">When entering in the following time frame:</font></div><div class="gmail_default" style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif"><br></font></div><div class="gmail_default" style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><div class="gmail_default"><font face="verdana, sans-serif">date_from = pd.datetime(2014,10,1,tzinfo=pytz.utc)</font></div><div class="gmail_default"><font face="verdana, sans-serif">date_to = pd.datetime(2015,11,30,tzinfo=pytz.utc)</font></div><div class="gmail_default"><font face="verdana, sans-serif"><br></font></div><div class="gmail_default"><font face="verdana, sans-serif">I get conflicting answers in "#Q3: Number of emails in a time frame"</font><span style="font-family:verdana,sans-serif"> and </span></div><div class="gmail_default"><font face="verdana, sans-serif"><div class="gmail_default">"#then you can specify some years and have the break down per month".</div><div><br></div></font></div><div class="gmail_default"><font face="verdana, sans-serif">[Q3]: 291</font></div><div class="gmail_default"><font face="verdana, sans-serif">[some years per month]: 281 (11 for year 2014 + (month 1 until 11 of year 2015))</font><pre style="white-space:pre-wrap;box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px;border-radius:0px;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif"><br></font></pre><pre style="white-space:pre-wrap;box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px;border-radius:0px;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">2014:  tot 11
    1:   0
    2:   0
    3:   0
    4:   0
    5:   0
    6:   0
    7:   0
    8:   0
    9:   0
    10:   4
    11:   6
    12:   1
____________________
2015:  tot 297
    1:   15
    2:   12
    3:   14
    4:   8
    5:   46
    6:   11
    7:   37
    8:   11
    9:   11
    10:   61
    11:   44
    12:   13</font></pre><pre style="white-space:pre-wrap;box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px;border-radius:0px;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif"><br></font></pre><pre style="white-space:pre-wrap;box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px;border-radius:0px;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">And if I calculate backwards, so for the period I am interested in (october 2014 until and including november 2015): 11 + (297 - 13) = 295 </font></pre><pre style="white-space:pre-wrap;box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px;border-radius:0px;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">which is again different from the answers I got above.</font></pre><pre style="white-space:pre-wrap;box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px;border-radius:0px;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif"><br></font></pre><pre style="white-space:pre-wrap;box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);background-color:rgb(255,255,255);border:0px;border-radius:0px;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">What explains these discrepancies and which one is the authoritative answer? (which might just be that my math skills suck, very possible)</font></pre></div></div><font face="verdana, sans-serif"><br></font></div><div class="gmail_default"><font face="verdana, sans-serif"><br></font></div><div class="gmail_default"><font face="verdana, sans-serif"><u>Question 2:</u></font></div><div class="gmail_default"><font face="verdana, sans-serif"><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">#Q5 get threads with most replies</div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">It would be interesting to have that also follow the timeline set in #here you can set the time frame, which currently it does not do. <div class="gmail_default" style="text-decoration-style:initial;text-decoration-color:initial">Which I can see because the thread it credits for being the highest, is out of sync with my qualitative analysis for the time period (2014, 2015).</div></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">So for instance, what is the highest number of threads for </div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><div class="gmail_default" style="font-family:arial,sans-serif;font-size:small;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">date_from = pd.datetime(2014,10,1,tzinfo=pytz.utc)</font></div><div class="gmail_default" style="font-family:arial,sans-serif;font-size:small;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">date_to = pd.datetime(2015,11,30,tzinfo=pytz.utc)</font></div><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">Or for </div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><div class="gmail_default" style="font-family:arial,sans-serif;font-size:small;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">date_from = pd.datetime(2015,12,1,tzinfo=pytz.utc)</font></div><div class="gmail_default" style="font-family:arial,sans-serif;font-size:small;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">date_to = pd.datetime(2016,11,30,tzinfo=pytz.utc)</font></div><div class="gmail_default" style="font-family:arial,sans-serif;font-size:small;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif"><br></font></div><div class="gmail_default" style="font-family:arial,sans-serif;font-size:small;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">etc. </font></div><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><u>Question 3:</u></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">In #threads with most replies: I get the following results</div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><pre style="box-sizing:border-box;overflow:auto;font-family:monospace;font-size:14px;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);border:0px;border-radius:0px;white-space:pre-wrap;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial">[hrpc] Examining existing Venue Selection criteria   71
[hrpc] Case three: DDoS   55
[hrpc] Human Rights Research Group Call on draft-irtf-hrpc-research-07   53
Re: [hrpc] draft-tenoever-hrpc-research-02   32
[hrpc] Comments about draft-irtf-hrpc-research-07   26</pre><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">However, these counts don't hold up against my qualitative count (which was done by hand) and hold that there are 34 responses to "[hrpc] Examining existing venue selection criteria" and it also doesn't sync with the IETF mailing list archive: <a href="https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=examining+venue" target="_blank">https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=examining+venue</a>  which says there are 35 responses to this thread.</div><div class="gmail_default" style="background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="background-color:rgb(255,255,255)">Similarly, for <pre style="box-sizing:border-box;overflow:auto;display:block;padding:1px 0px;margin:0px;line-height:inherit;word-break:break-all;word-wrap:break-word;color:rgb(0,0,0);border:0px;border-radius:0px;white-space:pre-wrap;vertical-align:baseline;text-align:left;text-decoration-style:initial;text-decoration-color:initial"><font face="verdana, sans-serif">[hrpc] Human Rights Research Group Call on draft-irtf-hrpc-research-07 My hand counted notes say there are 53 responses, bigbang has 53, but the archive has 56 see:<a href="https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=Human+Rights+Research+Group+Call+on+draft-irtf-hrpc-research-07+" target="_blank">https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=Human+Rights+Research+Group+Call+on+draft-irtf-hrpc-research-07+</a></font></pre></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">I am perfectly comfortable to assume that my hand-count is off by a bit, but the discrepancy between the ietf archive and bigbang is odd. </div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">Especially because for instance for hrpc Case three: DDOS, my notes, the archive and bigbang sync up perfectly to 55 reponses. See: <a href="https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=%5Bhrpc%5D+Case+three%3A+DDoS" target="_blank">https://mailarchive.ietf.org/arch/browse/hrpc/?gbt=1&q=%5Bhrpc%5D+Case+three%3A+DDoS</a> </div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">I am sure this has something to do with how the bigbang tool counts versus how the ietf counts versus how I count, but this does raise questions again about what the authoritative answer is. </div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="font-size:12.8px;background-color:rgb(255,255,255)">Happy to think along! best,</div><div class="gmail_default" style="text-decoration:underline;font-size:12.8px;background-color:rgb(255,255,255)"><br></div><div class="gmail_default" style="text-decoration:underline;font-size:12.8px;background-color:rgb(255,255,255)"><br></div><br></font></div><div class="gmail_default"><font face="verdana, sans-serif"><u><br></u></font></div><div><font face="verdana, sans-serif"><br></font></div><font face="verdana, sans-serif">-- <br></font><div class="m_7472392260516440143gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><font face="verdana, sans-serif">Corinne Cath <br>Ph.D. Candidate, Oxford Internet Institute & Alan Turing Institute <br><br><span style="color:rgb(68,68,68)">Web: <a href="http://www.oii.ox.ac.uk/people/corinne-cath" target="_blank">www.oii.ox.ac.uk/people/corinne-cath</a> <br>Email: <a href="mailto:ccath@turing.ac.uk" target="_blank">ccath@turing.ac.uk</a> & <a href="mailto:corinnecath@gmail.com" target="_blank">corinnecath@gmail.com</a><br>Twitter: @C_Cath</span></font><br></div></div></div></div></div></div></div></div></div></div></div></div>
</div>
_______________________________________________<br>
Bigbang-dev mailing list<br>
<a href="mailto:Bigbang-dev@data-activism.net" target="_blank">Bigbang-dev@data-activism.net</a><br>
<a href="https://lists.ghserv.net/mailman/listinfo/bigbang-dev" rel="noreferrer" target="_blank">https://lists.ghserv.net/mailman/listinfo/bigbang-dev</a><br>
</blockquote></div>