[Bigbang-dev] Clarifying theoretical commitments going into IETF 116
Priyanka Sinha
priyanka.sinha.iitg at gmail.com
Mon Jan 30 07:03:06 CET 2023
I agree with you .. please find my comments inline
On Wed, 25 Jan 2023 at 18:18, Sebastian Benthall <sbenthall at gmail.com>
wrote:
> From a computational perspective, in my opinion from what you are saying,
>> doing CI would mean I just look at the flow of dialogues, i.e., turn by
>> turn or order of the messages (posts and comments) that one and others have
>> posted, but in a graph theory sense, I can ignore the temporal aspect and
>> treat all the conversation together. Technically, this may avoid getting
>> into issues of short text, noisy text that some statistical NLP methods
>> become difficult due to short context. This may also be less complex
>> computationally.
>>
>
> Aha. I see what you mean. This does seem computationally tractable.
> It reminds me of some of the earliest work I did with BigBang.
>
> What comes to mind is that different working groups might be different
> 'contexts' and so have different patterns to how the discourse unfolds.
>
> To be honest, this is a bit of a stretch for CI as envisioned by Helen
> Nissenbaum. But when I originally approached Helen after working on
> BigBang, I also was thinking about mailings lists as contexts and messages
> sent as information flows. I suppose making this connection in a
> publication would be worthwhile :)
>
> To really make it work with CI, we would need to also track personal
> identifiers within email bodies. I.e not only replies to people, but also
> references to people. (Maybe this would potentially include legal persons,
> such as company names.) So entity recognition would be great for this, if
> it was working.
>
So, identifying whether the email address used even when slightly different
refers to the exact same person, is something my algorithm can do which I
have presented at the AID workshop.
Within the email body, doing the entity recognition as well as perhaps
coreference resolution (i.e., the name of the person or company is
not present but is referred to with pronouns such as he/she/they) has
varying accuracy. I was happy to know of Effy's work in this direction.
Myself, I would try to use Effy's published work as well as try Lauren
Berk's (now Lauren Wheelock) work https://github.com/lauren897
https://dspace.mit.edu/handle/1721.1/127291?show=full which when I had
attended worked well for cases with short context.
>
> What kind of graph metrics would you find worth tracking?
>
This is an interesting question for me, since I haven't thought of the
graph from the perspective of say measures like betweenness centrality,
etc. I thought of it as a representation based on which we mine for
insights, using new graph neural network algorithms. For example, if we
represent the discourses as a multi edged temporal graph, where the
different types of edges represent different aspects of the communication
that we take into account, then if we work on extracting say graphlets
(which in my mind are homeomorphic subgraph patterns (say could have maybe
15 nodes which could be one set of folks that hold a particular view). Then
these graphlets we could label as different viewpoints in how they view
privacy?? I apologize if it doesn't make sense, I haven't yet figured this
out . I mean we could take the direction where we are not doing this .. and
we model the problem as a agent simulation where the goals are related to
the CI .. and inside we represent the agents and their interaction in the
graph structure and we create a learning model whose weights we are trying
to learn by trying to reach the goals based on the existing dialogue traces
(aka mailing list conversations) we have.
>
>> If the WN world view is so fine-grained that we need to look at
>> timestamps and model in continuous time domain, then for me I think that is
>> too challenging, albeit interesting. If WN is just major events and thus we
>> can split our data into windows or chunks manually, then we avoid the
>> problem.
>>
>
> I need to dig deeper to recall exactly how the computational sociology
> components of WN work.
> But my sense is that the qualitative theory in WN is much richer than its
> technical operationalization.
> That leaves a big gap that we can start trying to fill.
>
> I don't think continuous time analysis will be necessary; windows or
> chunks should be fine.
>
AWesome !!!!
>
> - S
>
-priyanka
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ghserv.net/pipermail/bigbang-dev/attachments/20230130/636e0f97/attachment.htm>
More information about the Bigbang-dev
mailing list