[Bigbang-dev] Gender diversity and draft productivity

Nick Doty npdoty at ischool.berkeley.edu
Tue Jul 21 23:59:16 CEST 2020


On Jul 10, 2020, at 8:43 PM, Sebastian Benthall <sbenthall at gmail.com> wrote:
> 
>> An issue that has not yet been settled is how we are measuring "diversity", and how that measurement should reflect our uncertainty and the possibility of more than two represented gender categories.
> 
> So far I haven’t been trying to capture or record people with non-binary genders both because it’s not easily estimated by gender-detector and similar libraries and for ethical considerations that it could be outing or identifying people. In general, my research has been trying to estimate the gender breakdown of populations but not to record and publish individual people’s genders, to avoid individual misgendering and to avoid the privacy risks of disclosing someone’s gender.
> 
> That makes sense.
> 
> It may make sense to break down the unknown cases further when they dominate. (See below)
> 
> I'm going to for the sake of honing the intuitions here push back and say that if we are using only such public and expressed information as one's stated name and public biography to infer gender, nothing we are doing is creating any new risk.
> 
> I guess I'm suspecting the "outing" case here.

Yeah, I should have been more careful in describing my thinking here. I don’t think there’s an outing risk to applying a library that uses public records on publicly archived data like the name attached to a mailing list. But if we do any manual annotation, particular annotation of non-binary or transgender people, there could be an outing risk in disclosing that data. 

Applying a gender estimation library on publicly recorded names has the other risk, though, of misgendering: it’s wrong some small but real fraction of the time and I’m concerned about the individualized results of an automated process being published in a way that others might come across it and misread it as an accurate description of a particular individual’s gender.

I’ve been in particular referring to this blog post from Nathan Matias from 2014, which has practical suggestions and also some clear ethical principles to follow:

Nathan Matias on "How to Ethically and Responsibly Identify Gender in Large Datasets”:
http://mediashift.org/2014/11/how-to-ethically-and-responsibly-identify-gender-in-large-datasets/ <http://mediashift.org/2014/11/how-to-ethically-and-responsibly-identify-gender-in-large-datasets/>

And this reference is also on my reading list but I haven’t gone through it yet:

Larson, Brian. “Gender as a Variable in Natural-Language Processing: Ethical Considerations.” In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, 1–11. Valencia, Spain: Association for Computational Linguistics, 2017. https://doi.org/10.18653/v1/W17-1601 <https://doi.org/10.18653/v1/W17-1601>.

> Yes, I still see errors, and most often with names that in the US are strongly gendered but in other countries may not be gendered or may have a different gender balance. Those are cases where the US/Western focus also leads to incorrect data. But those instances have been rare when I’ve done manual checks with groups of people I know; more often the gender-detector library is recording genders as unknown.
> 
> It looks like a method argument can switch the data backed to the UK name set, which might be slightly better for European and maybe other continental names.
> 
> Because IETF is global we could run both and average the two. Or if we get good national origin metadata about participants we could use it to map them to the right dictionary.

I had given up on choosing the right dictionary (since I didn’t expect to have per-person county of origin data for email addresses), but I do wonder about averaging or maybe only using the results when multiple dictionaries agree.

> I’d be interested in that. I have not looked at estimates of gender participation over time. I have compared different mailing lists/working groups, which seemed of interest. Some rough initial work in the graphs attached.
> 
> Awesome.
> 
> Has anything of theoretical interest explained the differences in the numbers?

I think groups working on accessibility topics have a lower fraction of male participants and a higher fraction of female participants, and it seems plausible that privacy, semantic web and digital publishing groups also have higher fractions of female participants, compared to most working groups. 

> For the cases where there's a proponderance of "unknowns", is it possible to break them into smaller categories?

I haven’t done this yet, but it would be possible to look at some manually and see if it’s easy to detect automated accounts, non-Western participation, just a few prominent accounts that don’t have names in their email addresses, etc.

> For example, I wonder if the dataset bias is causing a mailing list with a strong non-Western regional presence to register grey.

Yes, there are definitely a couple working groups in this W3C corpus that register as “unknown” which are topics of particular interest to non-Western participants.

> I think it would be better to use this method to look at the mailing list traffic by gender rather than the document authors: since there’s a small number of document editors, that’s something that could more easily be tagged by hand with higher precision.
> 
> I agree mostly.
> The mailing lists should have more interesting aggregate numbers.
> 
> I mainly started with HRPC drafts because of the close connection between the BigBang community and the HRPC community, and because with a small set of authors I knew we could validate it amongst ourselves. Be our own guinea pig, so to speak.
> 
> It would maybe be notable if the gender breakdown of the drafts were unrepresentative of the breakdown of the corresponding mailing lists.
> 
> Or if draft content varies, on average, with draft author gender.

+1 for starting with groups we know already and can do some manual validation. Some of the ethical recommendations are also to do work “in conversation” with the people being studied, and it might be that hrpc or similar research groups would be more open to discussion about gender, participation in standard-setting mailing lists or the methodological questions.

> Yes, I found the methods and caveats about them to be the most detailed part of working/writing on this topic. In the draft I’d put together so far, I started with all the limitations of the method, and then tried to explain why it still might be useful to look at these estimates. I’m still cautious about publishing that because I don’t know how much we can look past those limitations and whether any harm can be done by publishing estimates, but I’d be interested to hear other perspectives. 
> 
> I think it's good work and you should publish it!
> 
> Maybe it would be best to work on a paper together that could include multiple reviews and perspectives.
> 
> I'm all for that :)

Anyone else interested? I’d be happy to share a first draft (off-list, for now) with anyone who might want to collaborate.

Thanks,
Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200721/db1ed01b/attachment.html>


More information about the Bigbang-dev mailing list