[Bigbang-dev] parsing I-Ds as well as RFCs (was Re: IETF Affiliation Analysis with BigBang -- Scheduling a call)
Niels ten Oever
mail at nielstenoever.net
Tue Apr 7 10:05:40 CEST 2020
This is really cool Nick - I will probably try to operationalize this on the 3GPP specs as well. Unfortunately they're all in different kinds of Word Documents, with not necessarily a very strict template, so I am still messing with that. Thanks for the great work!
On 4/7/20 5:02 AM, Nick Doty wrote:
> I hope you’re all doing as well as can be expected during these trying times.
>
> One action I took away from our last call was that the rfc-analysis code should be able to parse Internet-Drafts as well. That work has been integrated now (along with upgrades to Python 3, thanks Seb). That let me start to do some basic comparisons which I found interesting; in the past I had looked at the lengths of Security Considerations sections in RFCs over time, and I can compare currently active I-Ds to the published RFCs.
>
> Graph is attached, and notebook with the steps here:
> https://github.com/npdoty/rfc-analysis/blob/master/notebooks/Security%20and%20Privacy%20mentions%20in%20RFCs%20and%20IDs.ipynb <https://github.com/npdoty/rfc-analysis/blob/master/notebooks/Security and Privacy mentions in RFCs and IDs.ipynb>
>
> Nothing shocking in those results, but I had vaguely assumed that the I-Ds might have more depth on security because they are recent, in-progress documents, but they seem to have a little less (perhaps because the documents are unfinished and not as fleshed out, or they haven’t gone through IESG review yet) and the trend over time in the RFCs seems to have flattened out.
>
> Anyway, that was largely a Jupyter notebook just to confirm and demonstrate the functionality, but I thought I would share a little of my ongoing work just so we’re keeping in touch.
>
> I think integrating the ietf-data module and the data in the IETF Datatracker will be useful (many documents have marked up XML that has more clearly marked sections and metadata), but it’s also good to be able to parse the raw text, as that still seems to be the only definite, required format.
>
> Cheers,
> Nick
>
>
>
>
>> On Mar 19, 2020, at 12:11 PM, Sebastian Benthall <sbenthall at gmail.com <mailto:sbenthall at gmail.com>> wrote:
>>
>> Thanks everyone who was able to make it onto the call.
>>
>> Agenda of the meeting, along with some notes based on what we discussed, are here:
>> https://etherpad.wikimedia.org/p/bigbang-affiliation-analytics-2
>>
>> The biggest and most productive outcome from the meeting in my view were the contributions of the Glasgow IPL group.
>> We'll be working to integrate with their project in the next phase:
>> https://github.com/glasgow-ipl/ietfdata
>>
>> This will help us answer Joey's questions about working group mailing list activity and working group productivity.
>>
>> I'll be happy to do a follow-up call with anybody who wasn't available for this call.
>> I'll be in touch in another month to schedule another update meeting.
>>
>> Bes regards,
>> Seb
--
Niels ten Oever
Researcher and PhD Candidate
DATACTIVE Research Group
University of Amsterdam
PGP fingerprint 2458 0B70 5C4A FD8A 9488
643A 0ED8 3F3A 468A C8B3
More information about the Bigbang-dev
mailing list