[Bigbang-dev] Are gender diversity and draft productivity correlated? THE VERDICT

Sebastian Benthall sbenthall at gmail.com
Fri Aug 28 01:16:01 CEST 2020


Ok. Please stand by....

It seems like the datatracking library, when used to crawl for a large
amount of drafts, pulls an index and then does calls to the datatracker web
API for calls the the draft metadata.

So I've had to write a new data collection script, similar to the script we
use for scraping the mailing lists, to get the draft data. It's a slower
process. But I should be able to compute these results once I have them
downloaded locally.

On Wed, Aug 26, 2020 at 4:09 PM Joey S <joeysalazar at article19.org> wrote:

> +1 to dnsop, their drafts are also quite numerous and with a very active
> mailing list.
>
> --
> Joey
>
> On 26-Aug-20 1:25 PM, Niels ten Oever wrote:
>
> Httpbis is the one you're looking for :)
>
> DNSops is also a nice big one.
>
> Cheers,
>
> Niels
> On Aug 26, 2020, at 21:17, Sebastian Benthall <sbenthall at gmail.com>
> wrote:
>>
>> Hmmm.
>>
>> Web mail archives of the http list at
>> https://ietf.org/mail-archive/text/http/ only go up to 2012.
>> Does that make sense to you?
>>
>> It looks like there are several DNS working groups. Any one in particular
>> you think would be worth looking at?
>>
>> Genericizing the code so that it can loop through many groups and compute
>> results is the next step towards confirmation. Probably worth looking at a
>> couple other concrete and well-understood examples before doing the big
>> analysis though.
>>
>> - S
>>
>> On Wed, Aug 26, 2020 at 1:52 PM Niels ten Oever < mail at nielstenoever.net>
>> wrote:
>>
>>> Very interesting. I'd say the number if drafts and authors in hrpc is
>>> too low to make a statement about this though. Could we do this for the
>>> HTTP and/or DNS WGs ?
>>> On Aug 26, 2020, at 19:30, Sebastian Benthall < sbenthall at gmail.com>
>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm revisiting the question of whether mailing list gender diversity
>>>> and draft productivity of working groups are correlated.
>>>>
>>>> Putting aside for now all the methodological complications, here is how
>>>> I am operationalizing the question:
>>>>
>>>>    - I'm looking specifically at the HRPC working group, with this
>>>>    data:
>>>>    [image: image.png]
>>>>    - Gender is being detected based on first name birth records.
>>>>    "unknown" is used for cases that cannot with the current data set be
>>>>    determined as either men or women.
>>>>    - I'm measuring "diversity" on any day as: (women's activity +
>>>>    unknown's activity) / (men's activity). Because, you know, this is probably
>>>>    close to what most people probably mean by diversity. (Recall that
>>>>    non-Western names are more likely to be categorized as "unknown".)
>>>>    - I'm using a 100 day rolling average on the activity counts.
>>>>
>>>> This is the matrix of Pearson correlations between each of these
>>>> values:
>>>>
>>>>
>>>> women unknown men drafts diversity
>>>> women 1.000000 0.910922 0.804869 0.008890 0.160833
>>>> unknown 0.910922 1.000000 0.808168 0.027502 0.245059
>>>> men 0.804869 0.808168 1.000000 0.015406 -0.141915
>>>> drafts 0.008890 0.027502 0.015406 1.000000 0.061884
>>>> diversity 0.160833 0.245059 -0.141915 0.061884 1.000000
>>>>
>>>> Things to note:
>>>>
>>>>    - The activity of each gender is correlated with the activity of
>>>>    other genders.
>>>>    - Diversity is anticorrelated with the number of men. This is
>>>>    expected based on how it was defined, and a good sanity check.
>>>>    - Draft output is MORE correlated with diversity than it is with
>>>>    any individual gender!
>>>>
>>>> This last point is quite nice. It resonates with the work of Scott Page
>>>> on the value of diversity to collective intelligence, for example.
>>>>
>>>> These numbers are a bit hard to interpret. How much should we trust
>>>> them? These are the *p*-values associated with each correlation:
>>>>
>>>> women unknown men drafts diversity
>>>> women 0 0 0 0.6925 0
>>>> unknown 0 0 0 0.221 0
>>>> men 0 0 0 0.493 0
>>>> drafts 0.6925 0.221 0.493 0 0.0059
>>>> diversity 0 0 0 0.0059 0
>>>>
>>>> Generally, *p*-values below .01 are considered "statistically
>>>> significant", i.e. publishable.
>>>> This correlation between diversity and draft output makes the cut!!
>>>>
>>>> So the verdict is: for HRPC, YES, gender diversity is correlated with
>>>> draft output.
>>>>
>>>> This result is robust to transformations of the activity scores into
>>>> the log space, which is comforting.
>>>> Further work is needed to see if this result is robust across other
>>>> IETF working groups.
>>>>
>>>> Nick, what would you say to including a result like this in the paper
>>>> about IETF and gender?
>>>>
>>>> Cheers,
>>>> Seb
>>>>
>>>>       ------------------------------
>>>>
>>>> Bigbang-dev mailing listBigbang-dev at data-activism.nethttps://lists.ghserv.net/mailman/listinfo/bigbang-dev
>>>>
>>>>
> _______________________________________________
> Bigbang-dev mailing listBigbang-dev at data-activism.nethttps://lists.ghserv.net/mailman/listinfo/bigbang-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200827/1fdc99a3/attachment.html>


More information about the Bigbang-dev mailing list