[Bigbang-dev] parsing I-Ds as well as RFCs (was Re: IETF Affiliation Analysis with BigBang -- Scheduling a call)

Nick Doty npdoty at ischool.berkeley.edu
Tue Apr 7 05:02:13 CEST 2020


I hope you’re all doing as well as can be expected during these trying times.

One action I took away from our last call was that the rfc-analysis code should be able to parse Internet-Drafts as well. That work has been integrated now (along with upgrades to Python 3, thanks Seb). That let me start to do some basic comparisons which I found interesting; in the past I had looked at the lengths of Security Considerations sections in RFCs over time, and I can compare currently active I-Ds to the published RFCs.

Graph is attached, and notebook with the steps here:
https://github.com/npdoty/rfc-analysis/blob/master/notebooks/Security%20and%20Privacy%20mentions%20in%20RFCs%20and%20IDs.ipynb <https://github.com/npdoty/rfc-analysis/blob/master/notebooks/Security%20and%20Privacy%20mentions%20in%20RFCs%20and%20IDs.ipynb>

Nothing shocking in those results, but I had vaguely assumed that the I-Ds might have more depth on security because they are recent, in-progress documents, but they seem to have a little less (perhaps because the documents are unfinished and not as fleshed out, or they haven’t gone through IESG review yet) and the trend over time in the RFCs seems to have flattened out.

Anyway, that was largely a Jupyter notebook just to confirm and demonstrate the functionality, but I thought I would share a little of my ongoing work just so we’re keeping in touch. 

I think integrating the ietf-data module and the data in the IETF Datatracker will be useful (many documents have marked up XML that has more clearly marked sections and metadata), but it’s also good to be able to parse the raw text, as that still seems to be the only definite, required format.

Cheers,
Nick




> On Mar 19, 2020, at 12:11 PM, Sebastian Benthall <sbenthall at gmail.com> wrote:
> 
> Thanks everyone who was able to make it onto the call.
> 
> Agenda of the meeting, along with some notes based on what we discussed, are here:
> https://etherpad.wikimedia.org/p/bigbang-affiliation-analytics-2 <https://etherpad.wikimedia.org/p/bigbang-affiliation-analytics-2>
> 
> The biggest and most productive outcome from the meeting in my view were the contributions of the Glasgow IPL group.
> We'll be working to integrate with their project in the next phase:
> https://github.com/glasgow-ipl/ietfdata <https://github.com/glasgow-ipl/ietfdata>
> 
> This will help us answer Joey's questions about working group mailing list activity and working group productivity.
> 
> I'll be happy to do a follow-up call with anybody who wasn't available for this call.
> I'll be in touch in another month to schedule another update meeting.
> 
> Bes regards,
> Seb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200406/bda67467/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: security-considerations-comparison.png
Type: image/png
Size: 115036 bytes
Desc: not available
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20200406/bda67467/attachment-0001.png>


More information about the Bigbang-dev mailing list