[Bigbang-dev] Are gender diversity and draft productivity correlated? THE VERDICT

Sebastian Benthall sbenthall at gmail.com
Wed Aug 26 19:29:30 CEST 2020


Hello,

I'm revisiting the question of whether mailing list gender diversity and
draft productivity of working groups are correlated.

Putting aside for now all the methodological complications, here is how I
am operationalizing the question:

   - I'm looking specifically at the HRPC working group, with this data:
   [image: image.png]
   - Gender is being detected based on first name birth records. "unknown"
   is used for cases that cannot with the current data set be determined as
   either men or women.
   - I'm measuring "diversity" on any day as: (women's activity + unknown's
   activity) / (men's activity). Because, you know, this is probably close to
   what most people probably mean by diversity. (Recall that non-Western names
   are more likely to be categorized as "unknown".)
   - I'm using a 100 day rolling average on the activity counts.

This is the matrix of Pearson correlations between each of these values:

women unknown men drafts diversity
women 1.000000 0.910922 0.804869 0.008890 0.160833
unknown 0.910922 1.000000 0.808168 0.027502 0.245059
men 0.804869 0.808168 1.000000 0.015406 -0.141915
drafts 0.008890 0.027502 0.015406 1.000000 0.061884
diversity 0.160833 0.245059 -0.141915 0.061884 1.000000

Things to note:

   - The activity of each gender is correlated with the activity of other
   genders.
   - Diversity is anticorrelated with the number of men. This is expected
   based on how it was defined, and a good sanity check.
   - Draft output is MORE correlated with diversity than it is with any
   individual gender!

This last point is quite nice. It resonates with the work of Scott Page on
the value of diversity to collective intelligence, for example.

These numbers are a bit hard to interpret. How much should we trust them?
These are the *p*-values associated with each correlation:
women unknown men drafts diversity
women 0 0 0 0.6925 0
unknown 0 0 0 0.221 0
men 0 0 0 0.493 0
drafts 0.6925 0.221 0.493 0 0.0059
diversity 0 0 0 0.0059 0

Generally, *p*-values below .01 are considered "statistically significant",
i.e. publishable.
This correlation between diversity and draft output makes the cut!!

So the verdict is: for HRPC, YES, gender diversity is correlated with draft
output.

This result is robust to transformations of the activity scores into the
log space, which is comforting.
Further work is needed to see if this result is robust across other IETF
working groups.

Nick, what would you say to including a result like this in the paper about
IETF and gender?

Cheers,
Seb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200826/a15b9ea6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 59026 bytes
Desc: not available
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200826/a15b9ea6/attachment-0001.png>


More information about the Bigbang-dev mailing list