[Bigbang-dev] Gender diversity and draft productivity
Sebastian Benthall
sbenthall at gmail.com
Fri Jul 10 18:09:00 CEST 2020
Thanks Gurshabad. This is very helpful.
I've done a deeper dive into the gender-detector package, and have a better
sense now of what it's doing.
I've also realized that there was a bug in *my* code, and that this was
part of misgendering Gurshabad. It is now saying "Gurshabad" is of
"unknown" gender.
Agree. Does it make sense to make this difference explicit, even if it's
> in the same category? eg. "Non-binary or could not be determined"
>
This is a good idea.
Given our current methods, we have no way of determining if somebody
considers themselves non-binary.
So these people will always be of "unknown" gender, from the perspective of
our research.
I see that as good to flag.
An issue that has not yet been settled is how we are measuring "diversity",
and how that measurement should reflect our uncertainty and the possibility
of more than two represented gender categories.
Non-authoritative as well, but fwiw, in agreement with Juliana that
> 'man' and 'woman' are probably better to use here. Maybe someone can
> also comment on whether 'masculine'/'feminine' also work for this? (The
> advantage I see with this descriptor is that the results then clearly
> remark on names and not people, but there may be other problems with
> this terminology that I'm not aware of. Apologies in advance if this
> suggestion seems misguided; happy to learn.)
>
This sounds very sensible to me.
One counterpoint though is that, digging more into the gender-detector
module, it looks like it's not using data about whether or not a name is
historically or linguistically masculine or feminine.
Rather it has count data for each country: the number of "male" and
"female" (it's labels) that have that name in each country. (I'm not sure
how this data was created. On of the people involved in that project,
Nathan Mathias, is now a professor at Cornell and would probably weigh in
if we asked him to.)
The gender guess is then based on whether or not the preponderance of uses
of the name apply to "male" or "female" people. There's a confidence cutoff
that's actually quite strict; anything below this confidence rate gets an
"unknown" response.
> Not arguing against the theory that a Western bias may exist in the
> dataset, but just stepping in to say that my name is not a good case to
> determine this: like lots of names following a Sikh naming convention, I
> don't think mine is specific for men/women.
>
Cool. Good to know! The BigBang code now reflects this.
Now I think the only names that are currently giving the code trouble are:
- "Stéphane Bortzmeyer". The dictionary is in ASCII and includes no
accents. In the US dictionary, "Stephane" has a 655/1128 male/female
count. In the UK dictionary, it has a 41/0 male/female count, and is
considered "male". This actually accords with my intuition--without looking
him up, I (from the US) had assumed Stéphane was a woman. Anyway, an
interesting regional difference.
- "=?utf-8?q?St=C3=A9phane_Couture?=" who is "unknown"
My conclusion is that while there's a fairly high error rate, the
gender-detector module is good enough as is to proceed with. The errors
should iron out as it's used at larger scale.
The next step is to get a sense of gendered mailing list participation
change *over time*, which I believe has not been done yet.
[image: image.png]
On the whole, this has been very helpful. Thanks to both Juliana and
Gurshabad.
I hope this effort contributes towards some publishable research down the
line. I anticipate that:
- The substance of this discussion is going to be critical to include in a
Methods section of any research paper
- Depending on how deep we wind up going into it, an audit of the gender
detection module and what we augment it with, the design process around it,
etc., might be a publishable piece in its own right.
Cheers,
Seb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200710/7beb25fe/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 29098 bytes
Desc: not available
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200710/7beb25fe/attachment-0001.png>
More information about the Bigbang-dev
mailing list