<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On 31 Aug 2020, at 14:52, Sebastian Benthall <<a href="mailto:sbenthall@gmail.com" class="">sbenthall@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><div class="">This seems like a bug somewhere – if the code is available, I’m happy to do a quick sanity check on how you’re using the datatracker library.<br class=""></div></div></div></blockquote><div class=""><br class=""></div><div class="">Thank you, Colin!</div><div class=""><br class=""></div><div class="">This is the script I've been running:</div><div class=""><a href="https://github.com/datactive/bigbang/blob/878d1ad053777ae652adafd468b153d4c0d20c92/bin/datatracker.py" class="">https://github.com/datactive/bigbang/blob/878d1ad053777ae652adafd468b153d4c0d20c92/bin/datatracker.py</a></div></div></div></div></blockquote><div><br class=""></div><div>A couple of quick things:</div><div><br class=""></div><div>* The group is “httpbis” not “httpbisa”</div><div>* The DocumentTypeURI is missing a trailing slash.</div><div><br class=""></div><div>Also, remember to look at the submissions to find the different versions of a draft, else you only get the most recent version. </div><div><br class=""></div><div>Try something like:</div><div><br class=""></div><div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">dt = DataTracker(cache_dir=Path("cache"))</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class=""><br class=""></span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">g  = dt.group_from_acronym("httpbis")</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">for d in dt.documents(group=g, doctype=dt.document_type_from_slug("draft")):</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">    print("")</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">    for sub_url in d.submissions:</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">        sub = dt.submission(sub_url)</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">        print(F"{sub.document_date.strftime('%Y-%m-%d')} {sub.name}-{sub.rev}")</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">        for a in sub.parse_authors():</span></font></div><div><font face="Courier" class=""><span style="font-style: normal; font-size: 10px;" class="">            print(F"           {a['name']} <{a['email']}>")</span></font></div></div><div><br class=""></div><div>This will find each submission of all the working group drafts for a particular group. It doesn’t follow the history back to the pre-working group individual submissions, but can be extended to do that if needed. </div><div><br class=""></div><div>Colin</div><div><br class=""></div><div><br class=""></div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">I suppose I made some assumptions about the datatracker which, if false, would explain a lot. I assumed that the generator returns by the dt.documents method always returns drafts in the same order. Maybe that's not the case?</div><div class=""><br class=""></div><div class="">Many thanks,</div><div class="">Seb</div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><div class=""></div><div class="">Looking at <a href="https://datatracker.ietf.org/wg/httpbis/documents/" target="_blank" class="">https://datatracker.ietf.org/wg/httpbis/documents/</a> it seems that httpbis has 48 documents. Each of these will have gone through multiple versions as a draft, but even with ~20 draft per document (which is roughly typical), that’s not close to thousands. <br class=""></div><div class=""><br class=""></div><div class="">Searching <a href="https://mailarchive.ietf.org/arch/browse/i-d-announce/?q=httpbis" target="_blank" class="">https://mailarchive.ietf.org/arch/browse/i-d-announce/?q=httpbis</a> finds announcements for 721 internet drafts containing the string “httpbis”, which seems plausible.</div><div class=""><br class=""></div>Colin</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">Another issue here is that the draft output preceeds the mailing list records (see attachment). Another is that there are very emails sent by women (or, so identifiable by our detection method) in httpbisa:</div><div class=""><br class=""></div><div class=""><div class=""><span id="gmail-m_2800774085469158009cid:ii_keg4nimb1" class=""><image.png></span><br class=""></div></div><div class=""><br class=""></div><div class=""><div class=""><br class=""></div></div><div class=""><br class=""></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 26, 2020 at 3:26 PM Niels ten Oever <<a href="mailto:mail@nielstenoever.net" target="_blank" class="">mail@nielstenoever.net</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="zoom:0%" class=""><div dir="auto" class="">Httpbis is the one you're looking for :)<br class=""><br class=""></div>
<div dir="auto" class="">DNSops is also a nice big one.<br class=""><br class=""></div>
<div dir="auto" class="">Cheers,<br class=""><br class=""></div>
<div dir="auto" class="">Niels</div>
<div class="gmail_quote">On Aug 26, 2020, at 21:17, Sebastian Benthall <<a href="mailto:sbenthall@gmail.com" target="_blank" class="">sbenthall@gmail.com</a>> wrote:<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr" class="">
 Hmmm.
 <div class="">
  <br class="">
 </div>
 <div class="">
  Web mail archives of the http list at 
  <a href="https://ietf.org/mail-archive/text/http/" target="_blank" class="">https://ietf.org/mail-archive/text/http/</a> only go up to 2012.
 </div>
 <div class="">
  Does that make sense to you?
 </div>
 <div class="">
  <br class="">
 </div>
 <div class="">
  It looks like there are several DNS working groups. Any one in particular you think would be worth looking at?
 </div>
 <div class="">
  <br class="">
 </div>
 <div class="">
  Genericizing the code so that it can loop through many groups and compute results is the next step towards confirmation. Probably worth looking at a couple other concrete and well-understood examples before doing the big analysis though.
 </div>
 <div class="">
  <br class="">
 </div>
 <div class="">
  - S
 </div>
</div>
<br class="">
<div class="gmail_quote">
 <div dir="ltr" class="gmail_attr">
  On Wed, Aug 26, 2020 at 1:52 PM Niels ten Oever <
  <a href="mailto:mail@nielstenoever.net" target="_blank" class="">mail@nielstenoever.net</a>> wrote:
  <br class="">
 </div>
 <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  <div class="">
   <div dir="auto" class="">
    Very interesting. I'd say the number if drafts and authors in hrpc is too low to make a statement about this though. Could we do this for the HTTP and/or DNS WGs ?
   </div> 
   <div class="gmail_quote">
    On Aug 26, 2020, at 19:30, Sebastian Benthall <
    <a href="mailto:sbenthall@gmail.com" target="_blank" class="">sbenthall@gmail.com</a>> wrote:
    <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> 
     <div dir="ltr" class="">
      Hello,
      <div class="">
       <br class="">
      </div>
      <div class="">
       I'm revisiting the question of whether mailing list gender diversity and draft productivity of working groups are correlated.
      </div>
      <div class="">
       <br class="">
      </div>
      <div class="">
       Putting aside for now all the methodological complications, here is how I am operationalizing the question:
      </div>
      <div class="">
       <ul class="">
        <li class="">I'm looking specifically at the HRPC working group, with this data:<br class="">
         <div class="">
          <img alt="image.png" width="418" height="221" class="">
          <br class="">
         </div></li>
        <li class="">
         <div class="">
          Gender is being detected based on first name birth records. "unknown" is used for cases that cannot with the current data set be determined as either men or women.
         </div></li>
        <li class="">I'm measuring "diversity" on any day as: (women's activity + unknown's activity) / (men's activity). Because, you know, this is probably close to what most people probably mean by diversity. (Recall that non-Western names are more likely to be categorized as "unknown".)<br class=""></li>
        <li class="">I'm using a 100 day rolling average on the activity counts.</li>
       </ul>
       <div class="">
        This is the matrix of Pearson correlations between each of these values:
       </div>
      </div>
      <div class="">
       <br class="">
      </div>
      <div class="">
       <table border="1" class="">
        <thead class="">
         <tr style="text-align:right" class="">
          <th class=""></th> 
          <th class="">women</th> 
          <th class="">unknown</th> 
          <th class="">men</th> 
          <th class="">drafts</th> 
          <th class="">diversity</th> 
         </tr> 
        </thead> 
        <tbody class=""> 
         <tr class=""> 
          <th class="">women</th> 
          <td class=""><font color="#0000ff" class="">1.000000</font></td> 
          <td class=""><font color="#0000ff" class="">0.910922</font></td> 
          <td class=""><font color="#0000ff" class="">0.804869</font></td> 
          <td class="">0.008890</td> 
          <td class="">0.160833</td> 
         </tr> 
         <tr class=""> 
          <th class="">unknown</th> 
          <td class=""><font color="#0000ff" class="">0.910922</font></td> 
          <td class=""><font color="#0000ff" class="">1.000000</font></td> 
          <td class=""><font color="#0000ff" class="">0.808168</font></td> 
          <td class="">0.027502</td> 
          <td class="">0.245059</td> 
         </tr> 
         <tr class=""> 
          <th class="">men</th> 
          <td class=""><font color="#0000ff" class="">0.804869</font></td> 
          <td class=""><font color="#0000ff" class="">0.808168</font></td> 
          <td class=""><font color="#0000ff" class="">1.000000</font></td> 
          <td class="">0.015406</td> 
          <td class="">-0.141915</td> 
         </tr> 
         <tr class=""> 
          <th class="">drafts</th> 
          <td class=""><font color="#cc0000" class="">0.008890</font></td> 
          <td class=""><font color="#cc0000" class="">0.027502</font></td> 
          <td class=""><font color="#cc0000" class="">0.015406</font></td> 
          <td class="">1.000000</td> 
          <td class=""><font color="#cc0000" class="">0.061884</font></td> 
         </tr> 
         <tr class=""> 
          <th class="">diversity</th> 
          <td class=""><font color="#674ea7" class="">0.160833</font></td> 
          <td class=""><font color="#674ea7" class="">0.245059</font></td> 
          <td class=""><font color="#674ea7" class="">-0.141915</font></td> 
          <td class="">0.061884</td> 
          <td class="">1.000000<br class=""></td>
         </tr>
        </tbody>
       </table>
       <br class="">Things to note:
      </div>
      <div class="">
       <ul class="">
        <li class=""><font color="#0000ff" class="">The activity of each gender is correlated with the activity of other genders.</font></li>
        <li class=""><font color="#674ea7" class="">Diversity is anticorrelated with the number of men. This is expected based on how it was defined, and a good sanity check.</font></li>
        <li class=""><font color="#cc0000" class="">Draft output is MORE correlated with diversity than it is with any individual gender!</font></li>
       </ul>
       <div class="">
        <font class="">This last point is quite nice. It resonates with the work of Scott Page on the value of diversity to collective intelligence, for example.</font>
       </div>
       <div class="">
        <font class=""><br class=""></font>
       </div>
       <div class="">
        <font class="">These numbers are a bit hard to interpret. How much should we trust them? These are the <i class="">p</i>-values associated with each correlation:</font>
       </div>
       <div class="">
        <table border="1" class="">
         <thead class="">
          <tr style="text-align:right" class="">
           <th class=""></th> 
           <th class="">women</th> 
           <th class="">unknown</th> 
           <th class="">men</th> 
           <th class="">drafts</th> 
           <th class="">diversity</th> 
          </tr> 
         </thead> 
         <tbody class=""> 
          <tr class=""> 
           <th class="">women</th> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class=""><font color="#cccccc" class="">0.6925</font></td> 
           <td class="">0</td> 
          </tr> 
          <tr class=""> 
           <th class="">unknown</th> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class=""><font color="#cccccc" class="">0.221</font></td> 
           <td class="">0</td> 
          </tr> 
          <tr class=""> 
           <th class="">men</th> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class=""><font color="#cccccc" class="">0.493</font></td> 
           <td class="">0</td> 
          </tr> 
          <tr class=""> 
           <th class="">drafts</th> 
           <td class=""><font color="#cccccc" class="">0.6925</font></td> 
           <td class=""><font color="#cccccc" class="">0.221</font></td> 
           <td class=""><font color="#cccccc" class="">0.493</font></td> 
           <td class="">0</td> 
           <td class=""><font color="#ff0000" class="">0.0059</font></td> 
          </tr> 
          <tr class=""> 
           <th class="">diversity</th> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class="">0</td> 
           <td class=""><font color="#ff0000" class="">0.0059</font></td> 
           <td class="">0</td>
          </tr>
         </tbody>
        </table>
       </div>
       <br class="">
      </div>
      <div class="">
       Generally, 
       <i class="">p</i>-values below .01 are considered "statistically significant", i.e. publishable.
      </div>
      <div class="">
       This correlation between diversity and draft output makes the cut!!
      </div>
      <div class="">
       <br class="">
      </div>
      <div class="">
       <font color="#0000ff" class="">So the verdict is: for HRPC, YES, gender diversity is correlated with draft output.</font>
      </div>
      <div class="">
       <font color="#0000ff" class=""><br class=""></font>
      </div>
      <div class="">
       <font class="">This result is robust to transformations of the activity scores into the log space, which is comforting.</font>
      </div>
      <div class="">
       <span class="">Further work is needed to see if this result is robust across other IETF working groups.</span>
      </div>
      <div class="">
       <span class=""><br class=""></span>
      </div>
      <div class="">
       <font class="">Nick, what would you say to including a result like this in the paper about IETF and gender?</font>
      </div>
      <div class="">
       <font class=""><br class=""></font>
      </div>
      <div class="">
       <font class="">Cheers,<br class="">Seb</font>
      </div>
      <div class="">
       <br class="">
      </div>
     </div> 
     <pre class="">      <hr class=""><br class="">Bigbang-dev mailing list<br class=""><a href="mailto:Bigbang-dev@data-activism.net" target="_blank" class="">Bigbang-dev@data-activism.net</a><br class=""><a href="https://lists.ghserv.net/mailman/listinfo/bigbang-dev" target="_blank" class="">https://lists.ghserv.net/mailman/listinfo/bigbang-dev</a><br class=""></pre>
    </blockquote>
   </div>
  </div>
 </blockquote>
</div></blockquote></div></div></blockquote></div>
<span id="gmail-m_2800774085469158009cid:f_keg4macr0" class=""><diversity-productivity-httpbisa.png></span>_______________________________________________<br class="">Bigbang-dev mailing list<br class=""><a href="mailto:Bigbang-dev@data-activism.net" target="_blank" class="">Bigbang-dev@data-activism.net</a><br class=""><a href="https://lists.ghserv.net/mailman/listinfo/bigbang-dev" target="_blank" class="">https://lists.ghserv.net/mailman/listinfo/bigbang-dev</a><br class=""></div></blockquote></div><br class=""></div></blockquote></div></div>
</div></blockquote></div><br class=""><div class="">
<br class=""><br class="">-- <br class="">Colin Perkins<br class=""><a href="https://csperkins.org/" class="">https://csperkins.org/</a><br class=""><br class=""><br class=""><br class="">

</div>
<br class=""></body></html>