[Bigbang-dev] research questions of interest for standard-setting participation
Niels ten Oever
niels at article19.org
Fri Feb 16 17:53:08 CET 2018
You can crawl from here: https://ietf.org/mail-archive/text/
Cheers,
Niels
Niels ten Oever
Article 19
www.article19.org
PGP fingerprint 2458 0B70 5C4A FD8A 9488
643A 0ED8 3F3A 468A C8B3
On 02/16/2018 05:51 PM, Sebastian Benthall wrote:
> Not urgent--I'll get the data with the crawler script and test as I go.
>
> On Feb 16, 2018 11:35 AM, "Sebastian Benthall" <sbenthall at gmail.com
> <mailto:sbenthall at gmail.com>> wrote:
>
> Is there a web-accessible link to a dump of the IETF data that's ready?
>
> I'm reinstall bigbang fresh on a new machine and figure I should
> start working with the IETF data set, as that's the topic of
> interest at the moment.
>
> On Fri, Feb 16, 2018 at 5:31 AM, Niels ten Oever
> <niels at article19.org <mailto:niels at article19.org>> wrote:
>
> I would love to at least listen-in!
>
> Cheers,
>
> Niels
> On 02/16/2018 01:26 AM, Nick Doty wrote:
> > On Feb 5, 2018, at 2:09 PM, Sebastian Benthall <sbenthall at gmail.com <mailto:sbenthall at gmail.com>
> > <mailto:sbenthall at gmail.com <mailto:sbenthall at gmail.com>>> wrote:
> >>
> >> 2) I just figured out how to make time for this in the short term. So
> >> count me in.
> >>
> >> Shall we plan a meeting about this?
> >
> > Yeah, I'd love to do that! Would folks be interested in an audio chat
> > next week? I will send around a Doodle poll if it's more than just me
> > and Seb.
> >
> >> On Feb 5, 2018 4:24 PM, "Sebastian Benthall" <sbenthall at gmail.com <mailto:sbenthall at gmail.com>
> >> <mailto:sbenthall at gmail.com <mailto:sbenthall at gmail.com>>> wrote:
> >>
> >> These are great questions, Nick.
> >>
> >> I'd love to work on them with you, especially because they are
> >> such general metrics.
> >> Sadly I've got almost no time to work on it until May, due to
> >> dissertation work.
> >>
> >> Let me provide some recommendations based on my attempts to
> >> address similar questions on SciPy and other lists.
> >
> > These comments are really helpful, thanks!
> >
> > I am interested to understand the math better, and could really use your
> > help on that. I definitely get your general point that because there's a
> > long-tail distribution in any case, I need to find cases that don't fit
> > that pattern in order to show meaningful results.
> >
> > I'm not sure I understand the concentration parameter, but it does seem
> > like something like that would be useful. I also thought there might be
> > interesting graph analysis metrics -- like centrality? -- in a graph of
> > the nodes of connections between participants and lists.
> >
> > Thanks again for your thoughts!
> > —Nick
> >
> >>
> >> * how many participants total in IETF work?
> >>
> >>
> >> The odds are *very* high that the emails-per-person distribution
> >> is a heavy-tail distribution.
> >> Based on previous work
> >>
> <https://conference.scipy.org/proceedings/scipy2015/pdfs/sebastian_benthall.pdf
> <https://conference.scipy.org/proceedings/scipy2015/pdfs/sebastian_benthall.pdf>>,
> >> I would test for fit to log normal and power law
> distributions.
> >> My money is on log normal being a better fit.
> >>
> >> This is important because when interpreting the results,
> we have
> >> to keep in mind that
> >> the log normal distribution is essentially a noise pattern.
> >> So it's easy to read into the data relationships that may
> not be
> >> there,
> >> especially if you're using a linear rather than a log linear
> >> relationship as an indicator.
> >>
> >> * how "sticky" is participation?
> >> if people participate on a list, do they
> return? do
> >> they show up to f2f meetings?
> >> what's the attrition rate?
> >> what's the distribution of length of
> participation?
> >>
> >>
> >> Assuming there is a heavy tail distribution of
> participation, then
> >> about half the contributors
> >> will only contribute once.
> >>
> >> The distribution of attrition/retention will look more or
> less
> >> just like the distribution of participation.
> >> The length will look like it as well.
> >>
> >> It's not clear how to interpret this, because the reasons
> why any
> >> particular person participates a lot
> >> or a little are very likely
> >> (a) myriad (no single reason, but rather a combination of
> many
> >> reasons, and
> >> (b) exogenous to the data itself.
> >>
> >> For these reasons I expect you would get more interesting
> results
> >> if you can segment the population
> >> into categories of interest. You've mentioned gender and
> firms of
> >> employment, which are both good ones.
> >>
> >> But for each category, you may want to have more than one
> parameter to
> >> characterize the each one's participation distribution.
> >> May mean /and/ variance?
> >>
> >> * who has participated longest? across the most groups?
> >> is there a group of "elites" across working
> groups?
> >>
> >>
> >> This is a great question.
> >> But keep in mind: the people who participate most are
> going to be
> >> participating a lot
> >> more numerically across all lists than others.
> >> So they will have more chances to participate in
> different lists.
> >>
> >> You may want to be looking at, for each participant, their
> >> individual distribution of participation
> >> over many lists, and then look at the concentration
> parameter of
> >> that distribution:
> >>
> >> https://en.wikipedia.org/wiki/Concentration_parameter
> <https://en.wikipedia.org/wiki/Concentration_parameter>
> >> <https://en.wikipedia.org/wiki/Concentration_parameter
> <https://en.wikipedia.org/wiki/Concentration_parameter>>
> >>
> >> The math can be a bit tricky but I think it's worth tackling
> >> correctly.
> >>
> >>
> >> how many participants are single-group?
> >>
> >>
> >> Since most participants will be only send one message, that's
> >> going to skew this metric
> >> unless you take that into account somehow.
> >>
> >>
> >> how many groups does the typical participant
> join?
> >>
> >> As I believe I've mentioned to this group before,
> I've been
> >> looking into estimating gender in mailing list
> participation,
> >> including:
> >>
> >> * What is the gender distribution of participants in
> Internet
> >> and Web technical standard-setting?
> >> how does that distribution differ from the
> population at
> >> large? from employment at related firms?
> >> does that distribution change over time?
> >> are there sub-groups which have distinctly different
> >> distributions?
> >> * Does the gender distribution of conversation differ
> from the
> >> gender distribution of the participants?
> >>
> >>
> >> Great questions.
> >>
> >>
> >> Do you have questions you'd like to add to this list?
> Would
> >> you be interested in trying to measure/answer one of
> these
> >> questions? Which are the easiest and which are the most
> >> difficult? What features would we need to add to
> BigBang to
> >> make them answerable?
> >>
> >>
> >> In sum, I think all these questions are great ones and
> related to
> >> each other.
> >> I think the biggest challenge is getting the correct
> statistical
> >> modeling right,
> >> so that the results are not misinterpreted.
> >>
> >> - Seb
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Bigbang-dev mailing list
> > Bigbang-dev at data-activism.net
> <mailto:Bigbang-dev at data-activism.net>
> > https://lists.ghserv.net/mailman/listinfo/bigbang-dev
> <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>
> >
>
>
> _______________________________________________
> Bigbang-dev mailing list
> Bigbang-dev at data-activism.net <mailto:Bigbang-dev at data-activism.net>
> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
> <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20180216/de8b9087/attachment.sig>
More information about the Bigbang-dev
mailing list