[Bigbang-dev] Are gender diversity and draft productivity correlated? THE VERDICT
Sebastian Benthall
sbenthall at gmail.com
Mon Aug 31 15:52:28 CEST 2020
>
> This seems like a bug somewhere – if the code is available, I’m happy to
> do a quick sanity check on how you’re using the datatracker library.
>
Thank you, Colin!
This is the script I've been running:
https://github.com/datactive/bigbang/blob/878d1ad053777ae652adafd468b153d4c0d20c92/bin/datatracker.py
I suppose I made some assumptions about the datatracker which, if false,
would explain a lot. I assumed that the generator returns by the
dt.documents method always returns drafts in the same order. Maybe that's
not the case?
Many thanks,
Seb
> Looking at https://datatracker.ietf.org/wg/httpbis/documents/ it seems
> that httpbis has 48 documents. Each of these will have gone through
> multiple versions as a draft, but even with ~20 draft per document (which
> is roughly typical), that’s not close to thousands.
>
> Searching https://mailarchive.ietf.org/arch/browse/i-d-announce/?q=httpbis finds
> announcements for 721 internet drafts containing the string “httpbis”,
> which seems plausible.
>
> Colin
>
>
>
> Another issue here is that the draft output preceeds the mailing list
> records (see attachment). Another is that there are very emails sent by
> women (or, so identifiable by our detection method) in httpbisa:
>
> <image.png>
>
>
>
>
> On Wed, Aug 26, 2020 at 3:26 PM Niels ten Oever <mail at nielstenoever.net>
> wrote:
>
>> Httpbis is the one you're looking for :)
>>
>> DNSops is also a nice big one.
>>
>> Cheers,
>>
>> Niels
>> On Aug 26, 2020, at 21:17, Sebastian Benthall <sbenthall at gmail.com>
>> wrote:
>>>
>>> Hmmm.
>>>
>>> Web mail archives of the http list at
>>> https://ietf.org/mail-archive/text/http/ only go up to 2012.
>>> Does that make sense to you?
>>>
>>> It looks like there are several DNS working groups. Any one in
>>> particular you think would be worth looking at?
>>>
>>> Genericizing the code so that it can loop through many groups and
>>> compute results is the next step towards confirmation. Probably worth
>>> looking at a couple other concrete and well-understood examples before
>>> doing the big analysis though.
>>>
>>> - S
>>>
>>> On Wed, Aug 26, 2020 at 1:52 PM Niels ten Oever < mail at nielstenoever.net>
>>> wrote:
>>>
>>>> Very interesting. I'd say the number if drafts and authors in hrpc is
>>>> too low to make a statement about this though. Could we do this for the
>>>> HTTP and/or DNS WGs ?
>>>> On Aug 26, 2020, at 19:30, Sebastian Benthall < sbenthall at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm revisiting the question of whether mailing list gender diversity
>>>>> and draft productivity of working groups are correlated.
>>>>>
>>>>> Putting aside for now all the methodological complications, here is
>>>>> how I am operationalizing the question:
>>>>>
>>>>> - I'm looking specifically at the HRPC working group, with this
>>>>> data:
>>>>> [image: image.png]
>>>>> - Gender is being detected based on first name birth records.
>>>>> "unknown" is used for cases that cannot with the current data set be
>>>>> determined as either men or women.
>>>>> - I'm measuring "diversity" on any day as: (women's activity +
>>>>> unknown's activity) / (men's activity). Because, you know, this is probably
>>>>> close to what most people probably mean by diversity. (Recall that
>>>>> non-Western names are more likely to be categorized as "unknown".)
>>>>> - I'm using a 100 day rolling average on the activity counts.
>>>>>
>>>>> This is the matrix of Pearson correlations between each of these
>>>>> values:
>>>>>
>>>>> women unknown men drafts diversity
>>>>> women 1.000000 0.910922 0.804869 0.008890 0.160833
>>>>> unknown 0.910922 1.000000 0.808168 0.027502 0.245059
>>>>> men 0.804869 0.808168 1.000000 0.015406 -0.141915
>>>>> drafts 0.008890 0.027502 0.015406 1.000000 0.061884
>>>>> diversity 0.160833 0.245059 -0.141915 0.061884 1.000000
>>>>>
>>>>> Things to note:
>>>>>
>>>>> - The activity of each gender is correlated with the activity of
>>>>> other genders.
>>>>> - Diversity is anticorrelated with the number of men. This is
>>>>> expected based on how it was defined, and a good sanity check.
>>>>> - Draft output is MORE correlated with diversity than it is with
>>>>> any individual gender!
>>>>>
>>>>> This last point is quite nice. It resonates with the work of Scott
>>>>> Page on the value of diversity to collective intelligence, for example.
>>>>>
>>>>> These numbers are a bit hard to interpret. How much should we trust
>>>>> them? These are the *p*-values associated with each correlation:
>>>>> women unknown men drafts diversity
>>>>> women 0 0 0 0.6925 0
>>>>> unknown 0 0 0 0.221 0
>>>>> men 0 0 0 0.493 0
>>>>> drafts 0.6925 0.221 0.493 0 0.0059
>>>>> diversity 0 0 0 0.0059 0
>>>>>
>>>>> Generally, *p*-values below .01 are considered "statistically
>>>>> significant", i.e. publishable.
>>>>> This correlation between diversity and draft output makes the cut!!
>>>>>
>>>>> So the verdict is: for HRPC, YES, gender diversity is correlated with
>>>>> draft output.
>>>>>
>>>>> This result is robust to transformations of the activity scores into
>>>>> the log space, which is comforting.
>>>>> Further work is needed to see if this result is robust across other
>>>>> IETF working groups.
>>>>>
>>>>> Nick, what would you say to including a result like this in the paper
>>>>> about IETF and gender?
>>>>>
>>>>> Cheers,
>>>>> Seb
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Bigbang-dev mailing list
>>>>> Bigbang-dev at data-activism.net
>>>>> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>>>>>
>>>>> <diversity-productivity-httpbisa.png>
> _______________________________________________
> Bigbang-dev mailing list
> Bigbang-dev at data-activism.net
> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200831/39b213fd/attachment-0001.html>
More information about the Bigbang-dev
mailing list