[Bigbang-dev] research questions of interest for standard-setting participation

Fri Feb 16 17:53:08 CET 2018

You can crawl from here: https://ietf.org/mail-archive/text/

Cheers,

Niels

Niels ten Oever

Article 19
www.article19.org

PGP fingerprint    2458 0B70 5C4A FD8A 9488
                   643A 0ED8 3F3A 468A C8B3

On 02/16/2018 05:51 PM, Sebastian Benthall wrote:
> Not urgent--I'll get the data with the crawler script and test as I go.
> 
> On Feb 16, 2018 11:35 AM, "Sebastian Benthall" <sbenthall at gmail.com
> <mailto:sbenthall at gmail.com>> wrote:
> 
>     Is there a web-accessible link to a dump of the IETF data that's ready?
> 
>     I'm reinstall bigbang fresh on a new machine and figure I should
>     start working with the IETF data set, as that's the topic of
>     interest at the moment.
> 
>     On Fri, Feb 16, 2018 at 5:31 AM, Niels ten Oever
>     <niels at article19.org <mailto:niels at article19.org>> wrote:
> 
>         I would love to at least listen-in!
> 
>         Cheers,
> 
>         Niels
>         On 02/16/2018 01:26 AM, Nick Doty wrote:
>         > On Feb 5, 2018, at 2:09 PM, Sebastian Benthall <sbenthall at gmail.com <mailto:sbenthall at gmail.com>
>         > <mailto:sbenthall at gmail.com <mailto:sbenthall at gmail.com>>> wrote:
>         >>
>         >> 2) I just figured out how to make time for this in the short term. So
>         >> count me in.
>         >>
>         >> Shall we plan a meeting about this?
>         >
>         > Yeah, I'd love to do that! Would folks be interested in an audio chat
>         > next week? I will send around a Doodle poll if it's more than just me
>         > and Seb.
>         >
>         >> On Feb 5, 2018 4:24 PM, "Sebastian Benthall" <sbenthall at gmail.com <mailto:sbenthall at gmail.com>
>         >> <mailto:sbenthall at gmail.com <mailto:sbenthall at gmail.com>>> wrote:
>         >>
>         >>     These are great questions, Nick.
>         >>
>         >>     I'd love to work on them with you, especially because they are
>         >>     such general metrics.
>         >>     Sadly I've got almost no time to work on it until May, due to
>         >>     dissertation work.
>         >>
>         >>     Let me provide some recommendations based on my attempts to
>         >>     address similar questions on SciPy and other lists.
>         >
>         > These comments are really helpful, thanks!
>         >
>         > I am interested to understand the math better, and could really use your
>         > help on that. I definitely get your general point that because there's a
>         > long-tail distribution in any case, I need to find cases that don't fit
>         > that pattern in order to show meaningful results. 
>         >
>         > I'm not sure I understand the concentration parameter, but it does seem
>         > like something like that would be useful. I also thought there might be
>         > interesting graph analysis metrics -- like centrality? -- in a graph of
>         > the nodes of connections between participants and lists.
>         >
>         > Thanks again for your thoughts!
>         > —Nick
>         >  
>         >>
>         >>         * how many participants total in IETF work?
>         >>
>         >>
>         >>     The odds are *very* high that the emails-per-person distribution
>         >>     is a heavy-tail distribution.
>         >>     Based on previous work
>         >>   
>          <https://conference.scipy.org/proceedings/scipy2015/pdfs/sebastian_benthall.pdf
>         <https://conference.scipy.org/proceedings/scipy2015/pdfs/sebastian_benthall.pdf>>,
>         >>     I would test for fit to log normal and power law
>         distributions.
>         >>     My money is on log normal being a better fit.
>         >>
>         >>     This is important because when interpreting the results,
>         we have
>         >>     to keep in mind that
>         >>     the log normal distribution is essentially a noise pattern.
>         >>     So it's easy to read into the data relationships that may
>         not be
>         >>     there,
>         >>     especially if you're using a linear rather than a log linear
>         >>     relationship as an indicator.
>         >>
>         >>         * how "sticky" is participation?
>         >>                 if people participate on a list, do they
>         return? do
>         >>         they show up to f2f meetings?
>         >>                 what's the attrition rate?
>         >>                 what's the distribution of length of
>         participation?
>         >>
>         >>
>         >>     Assuming there is a heavy tail distribution of
>         participation, then
>         >>     about half the contributors
>         >>     will only contribute once.
>         >>
>         >>     The distribution of attrition/retention will look more or
>         less
>         >>     just like the distribution of participation.
>         >>     The length will look like it as well.
>         >>
>         >>     It's not clear how to interpret this, because the reasons
>         why any
>         >>     particular person participates a lot
>         >>     or a little are very likely 
>         >>     (a) myriad (no single reason, but rather a combination of
>         many
>         >>     reasons, and 
>         >>     (b) exogenous to the data itself.
>         >>
>         >>     For these reasons I expect you would get more interesting
>         results
>         >>     if you can segment the population
>         >>     into categories of interest. You've mentioned gender and
>         firms of
>         >>     employment, which are both good ones.
>         >>
>         >>     But for each category, you may want to have more than one
>         parameter to
>         >>     characterize the each one's participation distribution.
>         >>     May mean /and/ variance?
>         >>
>         >>         * who has participated longest? across the most groups?
>         >>                 is there a group of "elites" across working
>         groups?
>         >>
>         >>
>         >>     This is a great question.
>         >>     But keep in mind: the people who participate most are
>         going to be
>         >>     participating a lot
>         >>     more numerically across all lists than others.
>         >>     So they will have more chances to participate in
>         different lists.
>         >>
>         >>     You may want to be looking at, for each participant, their
>         >>     individual distribution of participation
>         >>     over many lists, and then look at the concentration
>         parameter of
>         >>     that distribution:
>         >>
>         >>     https://en.wikipedia.org/wiki/Concentration_parameter
>         <https://en.wikipedia.org/wiki/Concentration_parameter>
>         >>     <https://en.wikipedia.org/wiki/Concentration_parameter
>         <https://en.wikipedia.org/wiki/Concentration_parameter>>
>         >>
>         >>     The math can be a bit tricky but I think it's worth tackling
>         >>     correctly.
>         >>      
>         >>
>         >>                 how many participants are single-group?
>         >>
>         >>
>         >>     Since most participants will be only send one message, that's
>         >>     going to skew this metric
>         >>     unless you take that into account somehow.
>         >>      
>         >>
>         >>                 how many groups does the typical participant
>         join?
>         >>
>         >>         As I believe I've mentioned to this group before,
>         I've been
>         >>         looking into estimating gender in mailing list
>         participation,
>         >>         including:
>         >>
>         >>         * What is the gender distribution of participants in
>         Internet
>         >>         and Web technical standard-setting?
>         >>             how does that distribution differ from the
>         population at
>         >>         large? from employment at related firms?
>         >>             does that distribution change over time?
>         >>             are there sub-groups which have distinctly different
>         >>         distributions?
>         >>         * Does the gender distribution of conversation differ
>         from the
>         >>         gender distribution of the participants?
>         >>
>         >>
>         >>     Great questions.
>         >>      
>         >>
>         >>         Do you have questions you'd like to add to this list?
>         Would
>         >>         you be interested in trying to measure/answer one of
>         these
>         >>         questions? Which are the easiest and which are the most
>         >>         difficult? What features would we need to add to
>         BigBang to
>         >>         make them answerable?
>         >>
>         >>
>         >>     In sum, I think all these questions are great ones and
>         related to
>         >>     each other.
>         >>     I think the biggest challenge is getting the correct
>         statistical
>         >>     modeling right,
>         >>     so that the results are not misinterpreted.
>         >>
>         >>     - Seb
>         >>      
>         >>
>         >
>         >
>         >
>         > _______________________________________________
>         > Bigbang-dev mailing list
>         > Bigbang-dev at data-activism.net
>         <mailto:Bigbang-dev at data-activism.net>
>         > https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>         <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>
>         >
> 
> 
>         _______________________________________________
>         Bigbang-dev mailing list
>         Bigbang-dev at data-activism.net <mailto:Bigbang-dev at data-activism.net>
>         https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>         <https://lists.ghserv.net/mailman/listinfo/bigbang-dev>
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20180216/de8b9087/attachment.sig>