[Bigbang-dev] Are gender diversity and draft productivity correlated? THE VERDICT

Sun Aug 30 16:28:16 CEST 2020

Joey,

To your question about non-technical usability...the short answer is no, it
is not yet usable by non-technical people. The work I'm doing on this topic
now is quite technically involved.

Turning BigBang into a tool that is more usable by those with, say, no
programming skills will take a significant effort. That effort would need
to include the participation of prospective users. We would consult them
about the features they are looking for, scope out how to build them, and
invite them to test the product before finalizing it.

I have been a software product lead before and would be happy to work with
you or others on this.

However, at this time, BigBang is a open technical project. The norms are a
bit different from product development. Anybody is welcome to be involved,
and questions about how to contribute or use the technology will be
addressed. But there is no such thing as a "non-technical person" in such a
community. The technical/non-technical binary is quite counterproductive
here: if somebody is writing emails to this mailing list, there is no
reason in principle why they could not also follow the  installation
instructions on the project README, at which point they have started a
journey of technical experience and education.

Best regards,
Seb

On Thu, Aug 27, 2020, 7:59 PM Joey S <joeysalazar at article19.org> wrote:

> oh that's interesting and unexpected! Thank you for sharing that with us,
> how easily done can this then be for non-tech/non-admin people trying to
> use the tool for something similar?
>
> --
> Joey Salazar
> Digital Sr. Programme Officer
> ARTICLE 19
> 6E9C 95E5 5BED 9413 5D08 55D5 0A40 4136 0DF0 1A91
>
> On 27-Aug-20 5:16 PM, Sebastian Benthall wrote:
>
> Ok. Please stand by....
>
> It seems like the datatracking library, when used to crawl for a large
> amount of drafts, pulls an index and then does calls to the datatracker web
> API for calls the the draft metadata.
>
> So I've had to write a new data collection script, similar to the script
> we use for scraping the mailing lists, to get the draft data. It's a slower
> process. But I should be able to compute these results once I have them
> downloaded locally.
>
> On Wed, Aug 26, 2020 at 4:09 PM Joey S <joeysalazar at article19.org> wrote:
>
>> +1 to dnsop, their drafts are also quite numerous and with a very active
>> mailing list.
>>
>> --
>> Joey
>>
>> On 26-Aug-20 1:25 PM, Niels ten Oever wrote:
>>
>> Httpbis is the one you're looking for :)
>>
>> DNSops is also a nice big one.
>>
>> Cheers,
>>
>> Niels
>> On Aug 26, 2020, at 21:17, Sebastian Benthall <sbenthall at gmail.com>
>> wrote:
>>>
>>> Hmmm.
>>>
>>> Web mail archives of the http list at
>>> https://ietf.org/mail-archive/text/http/ only go up to 2012.
>>> Does that make sense to you?
>>>
>>> It looks like there are several DNS working groups. Any one in
>>> particular you think would be worth looking at?
>>>
>>> Genericizing the code so that it can loop through many groups and
>>> compute results is the next step towards confirmation. Probably worth
>>> looking at a couple other concrete and well-understood examples before
>>> doing the big analysis though.
>>>
>>> - S
>>>
>>> On Wed, Aug 26, 2020 at 1:52 PM Niels ten Oever < mail at nielstenoever.net>
>>> wrote:
>>>
>>>> Very interesting. I'd say the number if drafts and authors in hrpc is
>>>> too low to make a statement about this though. Could we do this for the
>>>> HTTP and/or DNS WGs ?
>>>> On Aug 26, 2020, at 19:30, Sebastian Benthall < sbenthall at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm revisiting the question of whether mailing list gender diversity
>>>>> and draft productivity of working groups are correlated.
>>>>>
>>>>> Putting aside for now all the methodological complications, here is
>>>>> how I am operationalizing the question:
>>>>>
>>>>>    - I'm looking specifically at the HRPC working group, with this
>>>>>    data:
>>>>>    [image: image.png]
>>>>>    - Gender is being detected based on first name birth records.
>>>>>    "unknown" is used for cases that cannot with the current data set be
>>>>>    determined as either men or women.
>>>>>    - I'm measuring "diversity" on any day as: (women's activity +
>>>>>    unknown's activity) / (men's activity). Because, you know, this is probably
>>>>>    close to what most people probably mean by diversity. (Recall that
>>>>>    non-Western names are more likely to be categorized as "unknown".)
>>>>>    - I'm using a 100 day rolling average on the activity counts.
>>>>>
>>>>> This is the matrix of Pearson correlations between each of these
>>>>> values:
>>>>>
>>>>>
>>>>> women unknown men drafts diversity
>>>>> women 1.000000 0.910922 0.804869 0.008890 0.160833
>>>>> unknown 0.910922 1.000000 0.808168 0.027502 0.245059
>>>>> men 0.804869 0.808168 1.000000 0.015406 -0.141915
>>>>> drafts 0.008890 0.027502 0.015406 1.000000 0.061884
>>>>> diversity 0.160833 0.245059 -0.141915 0.061884 1.000000
>>>>>
>>>>> Things to note:
>>>>>
>>>>>    - The activity of each gender is correlated with the activity of
>>>>>    other genders.
>>>>>    - Diversity is anticorrelated with the number of men. This is
>>>>>    expected based on how it was defined, and a good sanity check.
>>>>>    - Draft output is MORE correlated with diversity than it is with
>>>>>    any individual gender!
>>>>>
>>>>> This last point is quite nice. It resonates with the work of Scott
>>>>> Page on the value of diversity to collective intelligence, for example.
>>>>>
>>>>> These numbers are a bit hard to interpret. How much should we trust
>>>>> them? These are the *p*-values associated with each correlation:
>>>>>
>>>>> women unknown men drafts diversity
>>>>> women 0 0 0 0.6925 0
>>>>> unknown 0 0 0 0.221 0
>>>>> men 0 0 0 0.493 0
>>>>> drafts 0.6925 0.221 0.493 0 0.0059
>>>>> diversity 0 0 0 0.0059 0
>>>>>
>>>>> Generally, *p*-values below .01 are considered "statistically
>>>>> significant", i.e. publishable.
>>>>> This correlation between diversity and draft output makes the cut!!
>>>>>
>>>>> So the verdict is: for HRPC, YES, gender diversity is correlated with
>>>>> draft output.
>>>>>
>>>>> This result is robust to transformations of the activity scores into
>>>>> the log space, which is comforting.
>>>>> Further work is needed to see if this result is robust across other
>>>>> IETF working groups.
>>>>>
>>>>> Nick, what would you say to including a result like this in the paper
>>>>> about IETF and gender?
>>>>>
>>>>> Cheers,
>>>>> Seb
>>>>>
>>>>>       ------------------------------
>>>>>
>>>>> Bigbang-dev mailing listBigbang-dev at data-activism.nethttps://lists.ghserv.net/mailman/listinfo/bigbang-dev
>>>>>
>>>>>
>> _______________________________________________
>> Bigbang-dev mailing listBigbang-dev at data-activism.nethttps://lists.ghserv.net/mailman/listinfo/bigbang-dev
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200830/5c93c7ab/attachment-0001.html>