[Bigbang-dev] Clarifying theoretical commitments going into IETF 116

Tue Jan 31 14:03:35 CET 2023

Awesome !! .. I'd like to answer these questions a little better after more
thought, later this week.

I love the idea of funding proposals as well !!

Maybe through these discussions, we can zero in on one sub problem that
would help support such a proposal and hope to get it done by the hackathon?

On Tue, 31 Jan 2023 at 16:46, Sebastian Benthall <sbenthall at gmail.com>
wrote:

> Priyanka, Effy,
>
> So, identifying whether the email address used even when slightly
>> different refers to the exact same person, is something my algorithm can do
>> which I have presented at the AID workshop.
>
>
> Brilliant.
>
> Within the email body, doing the entity recognition as well as perhaps
>> coreference resolution (i.e., the name of the person or company is
>> not present but is referred to with pronouns such as he/she/they) has
>> varying accuracy. I was happy to know of Effy's work in this direction.
>> Myself, I would try to use Effy's published work as well as try Lauren
>> Berk's (now Lauren Wheelock) work https://github.com/lauren897
>> https://dspace.mit.edu/handle/1721.1/127291?show=full which when I had
>> attended worked well for cases with short context.
>>
>
> Of course, it would be ideal to work with Effy on this!
>
>
>> This is an interesting question for me, since I haven't thought of the
>> graph from the perspective of say measures like betweenness centrality,
>> etc. I thought of it as a representation based on which we mine for
>> insights, using new graph neural network algorithms.  For example, if we
>> represent the discourses as a multi edged temporal graph, where the
>> different types of edges represent different aspects of the communication
>> that we take into account, then if we work on extracting say graphlets
>> (which in my mind are homeomorphic subgraph patterns (say could have maybe
>> 15 nodes which could be one set of folks that hold a particular view).
>>
>
> Wow, this is very cool! I think I am following.
>
> Taking email communications as an example... I suppose this would mean
> labeling the messages somehow?
> For example, the label could include references to other entities?
>
> One challenge that has always been a problem for me in representing these
> discussions as a network is that while emails may have an "In-Reply-To"
> header, which is useful for modeling turn-taking and social responsiveness,
> in a 'mailing list' there is also the audience of lurkers, those on the
> thread who may be indirectly part of the audience, etc. Not to mention
> out-of-band communication. I suppose that at a large scale, one can chalk
> this all up to measurement error.
>
> But I bring it up because I'm wondering what concretely we might do with
> respect to preparing the dataset.
>
> (Ideally, our data preprocessing steps might support a number of different
> downstream 'user stories', which then feed into the dashboard for the
> 'users'... but our own use case of this research project can also be a good
> source of requirements.)
>
> I'm also wondering how the significant graphlets are identified. Does that
> involve labeling (i.e. supervised) of the graphlets?
> Or do these new algorithms extract network motifs based on frequencies
> alone?
>
>
>> Then these graphlets we could label as different viewpoints in how they
>> view privacy?? I apologize if it doesn't make sense, I haven't yet figured
>> this out .
>>
>
> I appreciate you going out on a limb. I think this is very exciting!
>
> It may be useful to distinguish analytically between:
>  - behavioral regularities -- which we could identify from the graph data
>  - *reasons for* those behavioral regularities, which could be:
>    - endogenous, because of internal dynamics within the system of
> communication (shades of Luhmann here...)
>    - exogenous (due to external forces such as the corporate structure of
> Cisco or the geographic distance between people)
>
> I suppose I would argue that for something to be a "norm", there is
> necessarily some endogenous dynamic that maintains it.
> (I don't think that's a sufficient condition, but I do think it might be a
> good necessary condition.)
>
> For something to be a 'norm', the endogenous dynamic maybe needs to
> involve the shared 'view' that the regularity is how things ought to be.
> I think we could set aside the question of whether these are 'privacy
> norms' until we have a firmer sense of how we are operationalizing things.
>
> These are very deep questions but I am into them. I started BigBang to
> study questions like this!
> But one of the first things I learned with BigBang is that not all
> behavioral regularities are due to endogenous factors, and that indeed
> exogenous explanations are often precisely what is needed as a kind of
> 'null hypothesis'.
>
> I mean we could take the direction where we are not doing this .. and we
>> model the problem as a agent simulation where the goals are related to the
>> CI .. and inside we represent the agents and their interaction in the graph
>> structure and we create a learning model whose weights we are trying to
>> learn by trying to reach the goals based on the existing dialogue traces
>> (aka mailing list conversations) we have.
>>
>
> I love where you are going with this! You see this as distinct from what
> you proposed previously?
>
> This seems to be a good way of figuring out how, say, an endogenous
> dynamic could be responsible for the behavioral regularities.
>
> If it's based on multiple agents interacting with a learning dynamic, that
> could be "normative" in a very rich sense, no?
>
> Truly, you're setting up an awesome vision here, Priyanka.
> It's of course much larger scope than a project for a single hackathon.
> It reads to me more like something that would become a funding proposal.
> I do very much like funding proposals though!
>
> - S
>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20230131/b6d8bbdd/attachment.htm>