<div dir="auto">1) meant to reply-all, whoops<div dir="auto">2) I just figured out how to make time for this in the short term. So count me in.</div><div dir="auto"><br></div><div dir="auto">Shall we plan a meeting about this?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Feb 5, 2018 4:24 PM, "Sebastian Benthall" <<a href="mailto:sbenthall@gmail.com">sbenthall@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">These are great questions, Nick.<div><br></div><div>I'd love to work on them with you, especially because they are such general metrics.</div><div>Sadly I've got almost no time to work on it until May, due to dissertation work.</div><div><br></div><div>Let me provide some recommendations based on my attempts to address similar questions on SciPy and other lists.<br><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">* how many participants total in IETF work?<br></blockquote><div><br></div><div>The odds are *very* high that the emails-per-person distribution is a heavy-tail distribution.</div><div>Based on <a href="https://conference.scipy.org/proceedings/scipy2015/pdfs/sebastian_benthall.pdf" target="_blank">previous work</a>, I would test for fit to log normal and power law distributions.</div><div>My money is on log normal being a better fit.</div><div><br></div><div>This is important because when interpreting the results, we have to keep in mind that</div><div>the log normal distribution is essentially a noise pattern.</div><div>So it's easy to read into the data relationships that may not be there,</div><div>especially if you're using a linear rather than a log linear relationship as an indicator.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
* how "sticky" is participation?<br>
if people participate on a list, do they return? do they show up to f2f meetings?<br>
what's the attrition rate?<br>
what's the distribution of length of participation?<br></blockquote><div><br></div><div>Assuming there is a heavy tail distribution of participation, then about half the contributors</div><div>will only contribute once.</div><div><br></div><div>The distribution of attrition/retention will look more or less just like the distribution of participation.</div><div>The length will look like it as well.</div><div><br></div><div>It's not clear how to interpret this, because the reasons why any particular person participates a lot</div><div>or a little are very likely </div><div>(a) myriad (no single reason, but rather a combination of many reasons, and </div><div>(b) exogenous to the data itself.</div><div><br></div><div>For these reasons I expect you would get more interesting results if you can segment the population</div><div>into categories of interest. You've mentioned gender and firms of employment, which are both good ones.</div><div><br></div><div>But for each category, you may want to have more than one parameter to</div><div>characterize the each one's participation distribution.</div><div>May mean <i>and</i> variance?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
* who has participated longest? across the most groups?<br>
is there a group of "elites" across working groups?<br></blockquote><div><br></div><div>This is a great question.</div><div>But keep in mind: the people who participate most are going to be participating a lot</div><div>more numerically across all lists than others.</div><div>So they will have more chances to participate in different lists.</div><div><br></div><div>You may want to be looking at, for each participant, their individual distribution of participation</div><div>over many lists, and then look at the concentration parameter of that distribution:</div><div><br></div><div><a href="https://en.wikipedia.org/wiki/Concentration_parameter" target="_blank">https://en.wikipedia.org/wiki/<wbr>Concentration_parameter</a><br></div><div><br></div><div>The math can be a bit tricky but I think it's worth tackling correctly.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
how many participants are single-group?<br></blockquote><div><br></div><div>Since most participants will be only send one message, that's going to skew this metric</div><div>unless you take that into account somehow.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
how many groups does the typical participant join?<br>
<br>
As I believe I've mentioned to this group before, I've been looking into estimating gender in mailing list participation, including:<br>
<br>
* What is the gender distribution of participants in Internet and Web technical standard-setting?<br>
how does that distribution differ from the population at large? from employment at related firms?<br>
does that distribution change over time?<br>
are there sub-groups which have distinctly different distributions?<br>
* Does the gender distribution of conversation differ from the gender distribution of the participants?<br></blockquote><div><br></div><div>Great questions.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Do you have questions you'd like to add to this list? Would you be interested in trying to measure/answer one of these questions? Which are the easiest and which are the most difficult? What features would we need to add to BigBang to make them answerable?<br></blockquote><div><br></div><div>In sum, I think all these questions are great ones and related to each other.</div><div>I think the biggest challenge is getting the correct statistical modeling right,</div><div>so that the results are not misinterpreted.</div><div><br></div><div>- Seb</div><div> </div></div></div></div></div>
</blockquote></div></div>