[Bigbang-dev] research questions of interest for standard-setting participation
Sebastian Benthall
sbenthall at gmail.com
Fri Feb 16 17:35:22 CET 2018
Is there a web-accessible link to a dump of the IETF data that's ready?
I'm reinstall bigbang fresh on a new machine and figure I should start
working with the IETF data set, as that's the topic of interest at the
moment.
On Fri, Feb 16, 2018 at 5:31 AM, Niels ten Oever <niels at article19.org>
wrote:
> I would love to at least listen-in!
>
> Cheers,
>
> Niels
> On 02/16/2018 01:26 AM, Nick Doty wrote:
> > On Feb 5, 2018, at 2:09 PM, Sebastian Benthall <sbenthall at gmail.com
> > <mailto:sbenthall at gmail.com>> wrote:
> >>
> >> 2) I just figured out how to make time for this in the short term. So
> >> count me in.
> >>
> >> Shall we plan a meeting about this?
> >
> > Yeah, I'd love to do that! Would folks be interested in an audio chat
> > next week? I will send around a Doodle poll if it's more than just me
> > and Seb.
> >
> >> On Feb 5, 2018 4:24 PM, "Sebastian Benthall" <sbenthall at gmail.com
> >> <mailto:sbenthall at gmail.com>> wrote:
> >>
> >> These are great questions, Nick.
> >>
> >> I'd love to work on them with you, especially because they are
> >> such general metrics.
> >> Sadly I've got almost no time to work on it until May, due to
> >> dissertation work.
> >>
> >> Let me provide some recommendations based on my attempts to
> >> address similar questions on SciPy and other lists.
> >
> > These comments are really helpful, thanks!
> >
> > I am interested to understand the math better, and could really use your
> > help on that. I definitely get your general point that because there's a
> > long-tail distribution in any case, I need to find cases that don't fit
> > that pattern in order to show meaningful results.
> >
> > I'm not sure I understand the concentration parameter, but it does seem
> > like something like that would be useful. I also thought there might be
> > interesting graph analysis metrics -- like centrality? -- in a graph of
> > the nodes of connections between participants and lists.
> >
> > Thanks again for your thoughts!
> > —Nick
> >
> >>
> >> * how many participants total in IETF work?
> >>
> >>
> >> The odds are *very* high that the emails-per-person distribution
> >> is a heavy-tail distribution.
> >> Based on previous work
> >> <https://conference.scipy.org/proceedings/scipy2015/pdfs/
> sebastian_benthall.pdf>,
> >> I would test for fit to log normal and power law distributions.
> >> My money is on log normal being a better fit.
> >>
> >> This is important because when interpreting the results, we have
> >> to keep in mind that
> >> the log normal distribution is essentially a noise pattern.
> >> So it's easy to read into the data relationships that may not be
> >> there,
> >> especially if you're using a linear rather than a log linear
> >> relationship as an indicator.
> >>
> >> * how "sticky" is participation?
> >> if people participate on a list, do they return? do
> >> they show up to f2f meetings?
> >> what's the attrition rate?
> >> what's the distribution of length of participation?
> >>
> >>
> >> Assuming there is a heavy tail distribution of participation, then
> >> about half the contributors
> >> will only contribute once.
> >>
> >> The distribution of attrition/retention will look more or less
> >> just like the distribution of participation.
> >> The length will look like it as well.
> >>
> >> It's not clear how to interpret this, because the reasons why any
> >> particular person participates a lot
> >> or a little are very likely
> >> (a) myriad (no single reason, but rather a combination of many
> >> reasons, and
> >> (b) exogenous to the data itself.
> >>
> >> For these reasons I expect you would get more interesting results
> >> if you can segment the population
> >> into categories of interest. You've mentioned gender and firms of
> >> employment, which are both good ones.
> >>
> >> But for each category, you may want to have more than one parameter
> to
> >> characterize the each one's participation distribution.
> >> May mean /and/ variance?
> >>
> >> * who has participated longest? across the most groups?
> >> is there a group of "elites" across working groups?
> >>
> >>
> >> This is a great question.
> >> But keep in mind: the people who participate most are going to be
> >> participating a lot
> >> more numerically across all lists than others.
> >> So they will have more chances to participate in different lists.
> >>
> >> You may want to be looking at, for each participant, their
> >> individual distribution of participation
> >> over many lists, and then look at the concentration parameter of
> >> that distribution:
> >>
> >> https://en.wikipedia.org/wiki/Concentration_parameter
> >> <https://en.wikipedia.org/wiki/Concentration_parameter>
> >>
> >> The math can be a bit tricky but I think it's worth tackling
> >> correctly.
> >>
> >>
> >> how many participants are single-group?
> >>
> >>
> >> Since most participants will be only send one message, that's
> >> going to skew this metric
> >> unless you take that into account somehow.
> >>
> >>
> >> how many groups does the typical participant join?
> >>
> >> As I believe I've mentioned to this group before, I've been
> >> looking into estimating gender in mailing list participation,
> >> including:
> >>
> >> * What is the gender distribution of participants in Internet
> >> and Web technical standard-setting?
> >> how does that distribution differ from the population at
> >> large? from employment at related firms?
> >> does that distribution change over time?
> >> are there sub-groups which have distinctly different
> >> distributions?
> >> * Does the gender distribution of conversation differ from the
> >> gender distribution of the participants?
> >>
> >>
> >> Great questions.
> >>
> >>
> >> Do you have questions you'd like to add to this list? Would
> >> you be interested in trying to measure/answer one of these
> >> questions? Which are the easiest and which are the most
> >> difficult? What features would we need to add to BigBang to
> >> make them answerable?
> >>
> >>
> >> In sum, I think all these questions are great ones and related to
> >> each other.
> >> I think the biggest challenge is getting the correct statistical
> >> modeling right,
> >> so that the results are not misinterpreted.
> >>
> >> - Seb
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Bigbang-dev mailing list
> > Bigbang-dev at data-activism.net
> > https://lists.ghserv.net/mailman/listinfo/bigbang-dev
> >
>
>
> _______________________________________________
> Bigbang-dev mailing list
> Bigbang-dev at data-activism.net
> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20180216/1b0ba838/attachment.html>
More information about the Bigbang-dev
mailing list