[Bigbang-dev] Are gender diversity and draft productivity correlated? THE VERDICT

Sebastian Benthall sbenthall at gmail.com
Wed Sep 2 13:39:15 CEST 2020


Hmm.
Thanks for this review.
I'll look into it.


On Tue, Sep 1, 2020, 7:09 PM Niels ten Oever <mail at nielstenoever.net> wrote:

> Hiya,
>
> On 9/1/20 5:04 PM, Sebastian Benthall wrote:
> > Some updates:
> >
> > - A plot of mailing list activity, by gender, and (final) draft output
> is up here, with the correlation values, is here:
> > https://github.com/datactive/bigbang/pull/394#issuecomment-684917057
>
> Interesting! In order to be able to judge
> https://github.com/datactive/bigbang/pull/394 it would be great if the
> y-axis of the graphs, and ideally also a data field in the notebook, would
> show the total number of drafts in a specific period. I have the feeling
> that the representation of the drafts is not fully correct.
>
> According to the graph (if I read it correctly) there should be no drafts
> for httpbis in the period 2012 - 2014, but a cursory glance at the
> datatracker [0] shows that RFC7230, 7231, 7232, 7233, 7235, 7236, and 7237
> were published in 2014, and I am pretty sure these RFCs were all preceded
> by quite a number of drafts.
>
> Cheers,
>
> Niels
>
> [0]
> https://datatracker.ietf.org/doc/search?name=httpbis&sort=&rfcs=on&activedrafts=on&by=group&group=
>
> >
> > - None of the correlations of mailing list activity with draft output is
> statistically significant! This reverses the previous verdict.
> >
> > - I've made an issue for expanding the draft metadata collection to
> include the submissions:
> > https://github.com/datactive/bigbang/issues/397
> >
> > - Could I request a review of the code for this project thus far? It's
> currently languishing a bit as a PR:
> > https://github.com/datactive/bigbang/pull/394
> >
> > I've got to work on a few other projects for a bit but I'm excited to
> hear where folks think we might go from here.
> >
> > Best regards,
> > Seb
> >
> >
> > On Mon, Aug 31, 2020 at 10:52 AM Sebastian Benthall <sbenthall at gmail.com
> <mailto:sbenthall at gmail.com>> wrote:
> >
> >     Thank you!
> >
> >
> >         * The group is “httpbis” not “httpbisa”
> >
> >
> >     Aha!
> >
> >     I found `httpbisa` as the closest acronym to `httpbis` on this list
> of IETF mailing list archives:
> >
> https://github.com/datactive/bigbang/blob/master/examples/url_collections/mm.ietf.org.txt
> >
> >     Niels, does it make sense that the mailing list and the working
> group have different names in this case? Is that common?
> >
> >     I can confirm that the records I pulled using the datatracker
> include drafts for working groups besides `httpbis`.
> group_from_acronym('nonsense') returns None. None passed as a group to the
> documents query results in a default query of all groups, I suppose.
> >
> >
> >         Also, remember to look at the submissions to find the different
> versions of a draft, else you only get the most recent version.
> >
> >         Try something like:
> >
> >         dt = DataTracker(cache_dir=Path("cache"))
> >
> >         g  = dt.group_from_acronym("httpbis")
> >         for d in dt.documents(group=g,
> doctype=dt.document_type_from_slug("draft")):
> >             print("")
> >             for sub_url in d.submissions:
> >                 sub = dt.submission(sub_url)
> >                 print(F"{sub.document_date.strftime('%Y-%m-%d')} {
> sub.name <http://sub.name>}-{sub.rev}")
> >                 for a in sub.parse_authors():
> >                     print(F"           {a['name']} <{a['email']}>")
> >
> >         This will find each submission of all the working group drafts
> for a particular group. It doesn’t follow the history back to the
> pre-working group individual submissions, but can be extended to do that if
> needed.
> >
> >
> >     I see. Thanks again for this.
> >
> >     I welcome input from any stakeholders about whether whether
> "productivity" should be operationalized in terms of final draft output
> and/or submissions.
> >
> >
> >
> >>             Looking at
> https://datatracker.ietf.org/wg/httpbis/documents/ it seems that httpbis
> has 48 documents. Each of these will have gone through multiple versions as
> a draft, but even with ~20 draft per document (which is roughly typical),
> that’s not close to thousands.
> >>
> >>             Searching
> https://mailarchive.ietf.org/arch/browse/i-d-announce/?q=httpbis finds
> announcements for 721 internet drafts containing the string “httpbis”,
> which seems plausible.
> >>
> >>             Colin
> >>
> >>
> >>
> >>>             Another issue here is that the draft output preceeds the
> mailing list records (see attachment). Another is that there are very
> emails sent by women (or, so identifiable by our detection method) in
> httpbisa:
> >>>
> >>>             <image.png>
> >>>
> >>>
> >>>
> >>>
> >>>             On Wed, Aug 26, 2020 at 3:26 PM Niels ten Oever <
> mail at nielstenoever.net <mailto:mail at nielstenoever.net>> wrote:
> >>>
> >>>                 Httpbis is the one you're looking for :)
> >>>
> >>>                 DNSops is also a nice big one.
> >>>
> >>>                 Cheers,
> >>>
> >>>                 Niels
> >>>                 On Aug 26, 2020, at 21:17, Sebastian Benthall <
> sbenthall at gmail.com <mailto:sbenthall at gmail.com>> wrote:
> >>>
> >>>                     Hmmm.
> >>>
> >>>                     Web mail archives of the http list at
> https://ietf.org/mail-archive/text/http/ only go up to 2012.
> >>>                     Does that make sense to you?
> >>>
> >>>                     It looks like there are several DNS working
> groups. Any one in particular you think would be worth looking at?
> >>>
> >>>                     Genericizing the code so that it can loop through
> many groups and compute results is the next step towards confirmation.
> Probably worth looking at a couple other concrete and well-understood
> examples before doing the big analysis though.
> >>>
> >>>                     - S
> >>>
> >>>                     On Wed, Aug 26, 2020 at 1:52 PM Niels ten Oever <
> mail at nielstenoever.net <mailto:mail at nielstenoever.net>> wrote:
> >>>
> >>>                         Very interesting. I'd say the number if drafts
> and authors in hrpc is too low to make a statement about this though. Could
> we do this for the HTTP and/or DNS WGs ?
> >>>                         On Aug 26, 2020, at 19:30, Sebastian Benthall
> < sbenthall at gmail.com <mailto:sbenthall at gmail.com>> wrote:
> >>>
> >>>                             Hello,
> >>>
> >>>                             I'm revisiting the question of whether
> mailing list gender diversity and draft productivity of working groups are
> correlated.
> >>>
> >>>                             Putting aside for now all the
> methodological complications, here is how I am operationalizing the
> question:
> >>>
> >>>                               * I'm looking specifically at the HRPC
> working group, with this data:
> >>>                                 image.png
> >>>                              *
> >>>                                 Gender is being detected based on
> first name birth records. "unknown" is used for cases that cannot with the
> current data set be determined as either men or women.
> >>>                               * I'm measuring "diversity" on any day
> as: (women's activity + unknown's activity) / (men's activity). Because,
> you know, this is probably close to what most people probably mean by
> diversity. (Recall that non-Western names are more likely to be categorized
> as "unknown".)
> >>>                               * I'm using a 100 day rolling average on
> the activity counts.
> >>>
> >>>                             This is the matrix of Pearson correlations
> between each of these values:
> >>>
> >>>                                     women   unknown         men
>  drafts  diversity
> >>>                             women   1.000000        0.910922
> 0.804869        0.008890        0.160833
> >>>                             unknown         0.910922        1.000000
>       0.808168        0.027502        0.245059
> >>>                             men     0.804869        0.808168
> 1.000000        0.015406        -0.141915
> >>>                             drafts  0.008890        0.027502
> 0.015406        1.000000        0.061884
> >>>                             diversity       0.160833        0.245059
>       -0.141915       0.061884        1.000000
> >>>
> >>>
> >>>                             Things to note:
> >>>
> >>>                               * The activity of each gender is
> correlated with the activity of other genders.
> >>>                               * Diversity is anticorrelated with the
> number of men. This is expected based on how it was defined, and a good
> sanity check.
> >>>                               * Draft output is MORE correlated with
> diversity than it is with any individual gender!
> >>>
> >>>                             This last point is quite nice. It
> resonates with the work of Scott Page on the value of diversity to
> collective intelligence, for example.
> >>>
> >>>                             These numbers are a bit hard to interpret.
> How much should we trust them? These are the /p/-values associated with
> each correlation:
> >>>                                     women   unknown         men
>  drafts  diversity
> >>>                             women   0       0       0       0.6925  0
> >>>                             unknown         0       0       0
>  0.221   0
> >>>                             men     0       0       0       0.493   0
> >>>                             drafts  0.6925  0.221   0.493   0
>  0.0059
> >>>                             diversity       0       0       0
>  0.0059  0
> >>>
> >>>
> >>>                             Generally, /p/-values below .01 are
> considered "statistically significant", i.e. publishable.
> >>>                             This correlation between diversity and
> draft output makes the cut!!
> >>>
> >>>                             So the verdict is: for HRPC, YES, gender
> diversity is correlated with draft output.
> >>>
> >>>                             This result is robust to transformations
> of the activity scores into the log space, which is comforting.
> >>>                             Further work is needed to see if this
> result is robust across other IETF working groups.
> >>>
> >>>                             Nick, what would you say to including a
> result like this in the paper about IETF and gender?
> >>>
> >>>                             Cheers,
> >>>                             Seb
> >>>
> >>>
> >>>
>  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>
> >>>                             Bigbang-dev mailing list
> >>>                             Bigbang-dev at data-activism.net <mailto:
> Bigbang-dev at data-activism.net>
> >>>
> https://lists.ghserv.net/mailman/listinfo/bigbang-dev
> >>>
> >>>
>  <diversity-productivity-httpbisa.png>_______________________________________________
> >>>             Bigbang-dev mailing list
> >>>             Bigbang-dev at data-activism.net <mailto:
> Bigbang-dev at data-activism.net>
> >>>             https://lists.ghserv.net/mailman/listinfo/bigbang-dev
> >>
> >
> >
> >
> >         --
> >         Colin Perkins
> >         https://csperkins.org/
> >
> >
> >
> >
>
> --
> Niels ten Oever
> Researcher and PhD Candidate - DATACTIVE Research Group - University of
> Amsterdam
> Postdoctoral Scholar (abd) - Communications Department - Texas A&M
> University
> Research Fellow - Centre for Internet and Human Rights - European
> University Viadrina
> Associated Scholar - Centro de Tecnologia e Sociedade - Fundação Getúlio
> Vargas
>
> W: https://nielstenoever.net
> E: mail at nielstenoever.net
> T: @nielstenoever
> P/S/WA: +31629051853
> PGP: 2458 0B70 5C4A FD8A 9488 643A 0ED8 3F3A 468A C8B3
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200902/9b742f7c/attachment-0001.html>


More information about the Bigbang-dev mailing list