<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">On Feb 5, 2018, at 2:09 PM, Sebastian Benthall <<a href="mailto:sbenthall@gmail.com" class="">sbenthall@gmail.com</a>> wrote:<br class=""><div><blockquote type="cite" class=""><br class="Apple-interchange-newline"><div class=""><div dir="auto" class=""><div dir="auto" class="">2) I just figured out how to make time for this in the short term. So count me in.</div><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Shall we plan a meeting about this?</div></div></div></blockquote><div><br class=""></div><div>Yeah, I'd love to do that! Would folks be interested in an audio chat next week? I will send around a Doodle poll if it's more than just me and Seb.</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote">On Feb 5, 2018 4:24 PM, "Sebastian Benthall" <<a href="mailto:sbenthall@gmail.com" class="">sbenthall@gmail.com</a>> wrote:<br type="attribution" class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">These are great questions, Nick.<div class=""><br class=""></div><div class="">I'd love to work on them with you, especially because they are such general metrics.</div><div class="">Sadly I've got almost no time to work on it until May, due to dissertation work.</div><div class=""><br class=""></div><div class="">Let me provide some recommendations based on my attempts to address similar questions on SciPy and other lists.<br class=""></div></div></blockquote></div></div></div></blockquote><div><br class=""></div><div>These comments are really helpful, thanks!</div><div><br class=""></div><div>I am interested to understand the math better, and could really use your help on that. I definitely get your general point that because there's a long-tail distribution in any case, I need to find cases that don't fit that pattern in order to show meaningful results. </div><div><br class=""></div><div>I'm not sure I understand the concentration parameter, but it does seem like something like that would be useful. I also thought there might be interesting graph analysis metrics -- like centrality? -- in a graph of the nodes of connections between participants and lists.</div><div><br class=""></div><div>Thanks again for your thoughts!</div><div>—Nick</div><div> </div><blockquote type="cite" class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div class=""><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">* how many participants total in IETF work?<br class=""></blockquote><div class=""><br class=""></div><div class="">The odds are *very* high that the emails-per-person distribution is a heavy-tail distribution.</div><div class="">Based on <a href="https://conference.scipy.org/proceedings/scipy2015/pdfs/sebastian_benthall.pdf" target="_blank" class="">previous work</a>, I would test for fit to log normal and power law distributions.</div><div class="">My money is on log normal being a better fit.</div><div class=""><br class=""></div><div class="">This is important because when interpreting the results, we have to keep in mind that</div><div class="">the log normal distribution is essentially a noise pattern.</div><div class="">So it's easy to read into the data relationships that may not be there,</div><div class="">especially if you're using a linear rather than a log linear relationship as an indicator.</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
* how "sticky" is participation?<br class="">
if people participate on a list, do they return? do they show up to f2f meetings?<br class="">
what's the attrition rate?<br class="">
what's the distribution of length of participation?<br class=""></blockquote><div class=""><br class=""></div><div class="">Assuming there is a heavy tail distribution of participation, then about half the contributors</div><div class="">will only contribute once.</div><div class=""><br class=""></div><div class="">The distribution of attrition/retention will look more or less just like the distribution of participation.</div><div class="">The length will look like it as well.</div><div class=""><br class=""></div><div class="">It's not clear how to interpret this, because the reasons why any particular person participates a lot</div><div class="">or a little are very likely </div><div class="">(a) myriad (no single reason, but rather a combination of many reasons, and </div><div class="">(b) exogenous to the data itself.</div><div class=""><br class=""></div><div class="">For these reasons I expect you would get more interesting results if you can segment the population</div><div class="">into categories of interest. You've mentioned gender and firms of employment, which are both good ones.</div><div class=""><br class=""></div><div class="">But for each category, you may want to have more than one parameter to</div><div class="">characterize the each one's participation distribution.</div><div class="">May mean <i class="">and</i> variance?</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
* who has participated longest? across the most groups?<br class="">
is there a group of "elites" across working groups?<br class=""></blockquote><div class=""><br class=""></div><div class="">This is a great question.</div><div class="">But keep in mind: the people who participate most are going to be participating a lot</div><div class="">more numerically across all lists than others.</div><div class="">So they will have more chances to participate in different lists.</div><div class=""><br class=""></div><div class="">You may want to be looking at, for each participant, their individual distribution of participation</div><div class="">over many lists, and then look at the concentration parameter of that distribution:</div><div class=""><br class=""></div><div class=""><a href="https://en.wikipedia.org/wiki/Concentration_parameter" target="_blank" class="">https://en.wikipedia.org/wiki/<wbr class="">Concentration_parameter</a><br class=""></div><div class=""><br class=""></div><div class="">The math can be a bit tricky but I think it's worth tackling correctly.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
how many participants are single-group?<br class=""></blockquote><div class=""><br class=""></div><div class="">Since most participants will be only send one message, that's going to skew this metric</div><div class="">unless you take that into account somehow.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
how many groups does the typical participant join?<br class="">
<br class="">
As I believe I've mentioned to this group before, I've been looking into estimating gender in mailing list participation, including:<br class="">
<br class="">
* What is the gender distribution of participants in Internet and Web technical standard-setting?<br class="">
how does that distribution differ from the population at large? from employment at related firms?<br class="">
does that distribution change over time?<br class="">
are there sub-groups which have distinctly different distributions?<br class="">
* Does the gender distribution of conversation differ from the gender distribution of the participants?<br class=""></blockquote><div class=""><br class=""></div><div class="">Great questions.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Do you have questions you'd like to add to this list? Would you be interested in trying to measure/answer one of these questions? Which are the easiest and which are the most difficult? What features would we need to add to BigBang to make them answerable?<br class=""></blockquote><div class=""><br class=""></div><div class="">In sum, I think all these questions are great ones and related to each other.</div><div class="">I think the biggest challenge is getting the correct statistical modeling right,</div><div class="">so that the results are not misinterpreted.</div><div class=""><br class=""></div><div class="">- Seb</div><div class=""> </div></div></div></div></div>
</blockquote></div></div>
</blockquote></div><br class=""></body></html>