[Bigbang-dev] Clarifying theoretical commitments going into IETF 116

Tue Jan 31 14:44:05 CET 2023

Thank you Seb !!! I plan to be there all week and actually until the 4th
April at IETF. Really looking forward to it. !!

On Tue, 31 Jan 2023 at 19:03, Sebastian Benthall <sbenthall at gmail.com>
wrote:

> Good idea.
> I know Niels is keen for me to work on an accessible dashboard at the IETF
> meeting.
> But I feel I have so much to learn from you Priyanka and would love to
> take the time to make some research progress.
>
> I'll be at the IETF meeting all week, and would be happy to do work along
> these lines beyond the hackathon as well.
>
>
> On Tue, Jan 31, 2023 at 4:03 PM Priyanka Sinha <
> priyanka.sinha.iitg at gmail.com> wrote:
>
>> Awesome !! .. I'd like to answer these questions a little better after
>> more thought, later this week.
>>
>> I love the idea of funding proposals as well !!
>>
>> Maybe through these discussions, we can zero in on one sub problem that
>> would help support such a proposal and hope to get it done by the hackathon?
>>
>> On Tue, 31 Jan 2023 at 16:46, Sebastian Benthall <sbenthall at gmail.com>
>> wrote:
>>
>>> Priyanka, Effy,
>>>
>>> So, identifying whether the email address used even when slightly
>>>> different refers to the exact same person, is something my algorithm can do
>>>> which I have presented at the AID workshop.
>>>
>>>
>>> Brilliant.
>>>
>>> Within the email body, doing the entity recognition as well as perhaps
>>>> coreference resolution (i.e., the name of the person or company is
>>>> not present but is referred to with pronouns such as he/she/they) has
>>>> varying accuracy. I was happy to know of Effy's work in this direction.
>>>> Myself, I would try to use Effy's published work as well as try Lauren
>>>> Berk's (now Lauren Wheelock) work https://github.com/lauren897
>>>> https://dspace.mit.edu/handle/1721.1/127291?show=full which when I had
>>>> attended worked well for cases with short context.
>>>>
>>>
>>> Of course, it would be ideal to work with Effy on this!
>>>
>>>
>>>> This is an interesting question for me, since I haven't thought of the
>>>> graph from the perspective of say measures like betweenness centrality,
>>>> etc. I thought of it as a representation based on which we mine for
>>>> insights, using new graph neural network algorithms.  For example, if we
>>>> represent the discourses as a multi edged temporal graph, where the
>>>> different types of edges represent different aspects of the communication
>>>> that we take into account, then if we work on extracting say graphlets
>>>> (which in my mind are homeomorphic subgraph patterns (say could have maybe
>>>> 15 nodes which could be one set of folks that hold a particular view).
>>>>
>>>
>>> Wow, this is very cool! I think I am following.
>>>
>>> Taking email communications as an example... I suppose this would mean
>>> labeling the messages somehow?
>>> For example, the label could include references to other entities?
>>>
>>> One challenge that has always been a problem for me in representing
>>> these discussions as a network is that while emails may have an
>>> "In-Reply-To" header, which is useful for modeling turn-taking and social
>>> responsiveness, in a 'mailing list' there is also the audience of lurkers,
>>> those on the thread who may be indirectly part of the audience, etc. Not to
>>> mention out-of-band communication. I suppose that at a large scale, one can
>>> chalk this all up to measurement error.
>>>
>>> But I bring it up because I'm wondering what concretely we might do with
>>> respect to preparing the dataset.
>>>
>>> (Ideally, our data preprocessing steps might support a number of
>>> different downstream 'user stories', which then feed into the dashboard for
>>> the 'users'... but our own use case of this research project can also be a
>>> good source of requirements.)
>>>
>>> I'm also wondering how the significant graphlets are identified. Does
>>> that involve labeling (i.e. supervised) of the graphlets?
>>> Or do these new algorithms extract network motifs based on frequencies
>>> alone?
>>>
>>>
>>>> Then these graphlets we could label as different viewpoints in how they
>>>> view privacy?? I apologize if it doesn't make sense, I haven't yet figured
>>>> this out .
>>>>
>>>
>>> I appreciate you going out on a limb. I think this is very exciting!
>>>
>>> It may be useful to distinguish analytically between:
>>>  - behavioral regularities -- which we could identify from the graph data
>>>  - *reasons for* those behavioral regularities, which could be:
>>>    - endogenous, because of internal dynamics within the system of
>>> communication (shades of Luhmann here...)
>>>    - exogenous (due to external forces such as the corporate structure
>>> of Cisco or the geographic distance between people)
>>>
>>> I suppose I would argue that for something to be a "norm", there is
>>> necessarily some endogenous dynamic that maintains it.
>>> (I don't think that's a sufficient condition, but I do think it might be
>>> a good necessary condition.)
>>>
>>> For something to be a 'norm', the endogenous dynamic maybe needs to
>>> involve the shared 'view' that the regularity is how things ought to be.
>>> I think we could set aside the question of whether these are 'privacy
>>> norms' until we have a firmer sense of how we are operationalizing things.
>>>
>>> These are very deep questions but I am into them. I started BigBang to
>>> study questions like this!
>>> But one of the first things I learned with BigBang is that not all
>>> behavioral regularities are due to endogenous factors, and that indeed
>>> exogenous explanations are often precisely what is needed as a kind of
>>> 'null hypothesis'.
>>>
>>> I mean we could take the direction where we are not doing this .. and we
>>>> model the problem as a agent simulation where the goals are related to the
>>>> CI .. and inside we represent the agents and their interaction in the graph
>>>> structure and we create a learning model whose weights we are trying to
>>>> learn by trying to reach the goals based on the existing dialogue traces
>>>> (aka mailing list conversations) we have.
>>>>
>>>
>>> I love where you are going with this! You see this as distinct from what
>>> you proposed previously?
>>>
>>> This seems to be a good way of figuring out how, say, an endogenous
>>> dynamic could be responsible for the behavioral regularities.
>>>
>>> If it's based on multiple agents interacting with a learning dynamic,
>>> that could be "normative" in a very rich sense, no?
>>>
>>> Truly, you're setting up an awesome vision here, Priyanka.
>>> It's of course much larger scope than a project for a single hackathon.
>>> It reads to me more like something that would become a funding proposal.
>>> I do very much like funding proposals though!
>>>
>>> - S
>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20230131/fdf78c7f/attachment.htm>