[liberationtech] Linguistics identifies anonymous users

Wed Jan 9 05:34:02 PST 2013

Such a framework can be social engineered as easily as SEO.  I make a small
living as a ghost writer and speech writer - the informal version of that
very process. Several of my clients say my writing sounds more like them in
print than they do, because they are less facile writers - but that is a
fault that could be avoided in competent forgeries. ;)

SN
On Jan 9, 2013 8:25 AM, "Eugen Leitl" <eugen at leitl.org> wrote:

>
>
> http://www.scmagazine.com.au/News/328135,linguistics-identifies-anonymous-users.aspx
>
> Linguistics identifies anonymous users
>
> By Darren Pauli on Jan 9, 2013 9:49 AM
>
> Researchers reveal carders, hackers on underground forums.
>
> Up to 80 percent of certain anonymous underground forum users can be
> identified using linguistics, researchers say.
>
> The techniques compare user posts to track them across forums and could
> even
> unveil authors of thesis papers or blogs who had taken to underground
> networks.
>
> "If our dataset contains 100 users we can at least identify 80 of them,"
> researcher Sadia Afroz told an audience at the 29C3 Chaos Communication
> Congress in Germany.
>
> "Function words are very specific to the writer. Even if you are writing a
> thesis, you'll probably use the same function words in chat messages.
>
> "Even if your text is not clean, your writing style can give you away."
>
> The analysis techniques could also reveal botnet owners, malware tool
> authors
> and provide insight into the size and scope of underground markets, making
> the research appealing to law enforcement.
>
> To achieve their results the researchers used techniques including
> stylometric analysis, the authorship attribution framework Jstylo, and
> Latent
> Dirichlet allocation which can distinguish a conversation on stolen credit
> cards from one on exploit-writing, and similarly help identify interesting
> people.
>
> The analysis was applied across millions of posts from tens of thousands of
> users of a series of multilingual underground websites including
> thebadhackerz.com, blackhatpalace.com, www.carders.cc, free-hack.com,
> hackel1te.info, hack-sector.forumh.net, rootwarez.org, L33tcrew.org and
> antichat.ru.
>
> It found up to 300 distinct discussion topics in the forums, with some of
> the
> most popular being carding, encryption services, password cracking and
> blackhat search engine optimisation tools.
>
> While successful, the work faces a series of challenges. Analysis could
> only
> be performed using a minimum of 5000 words (this research used the "gold
> standard" of 6500 words) which culled the list of potential targets from
> tens
> of thousands to mere hundreds.
>
> It also needs to separate discussion on product information like credit
> cards, exploits and drugs from conversational text in order to facilitate
> machine learning to automate the process, according to researcher Aylin
> Caliskan Islam.
>
> And posts must be translated to English, a process which boosted author
> identification from 66 to around 80 per cent but was imperfect using freely
> available tools like Google and Bing.
>
> However both of these tasks were performed successfully, and further
> development including the use of "exclusive" language translation tools
> would
> only serve to boost the identification accuracy.
>
> Leetspeak, an alternative alphabet popular in some forum circles, cannot be
> translated.
>
> The project is ongoing and future work promises to increase the capacity to
> unmask users. This Islam said would include temporal information which
> would
> exploit users who logged into forums from the same IP addresses and wrote
> posts at around the same time.
>
> Antichat user analysis
>
> "They might finish work, come home and log in," Islam said.
>
> It could also tie user identities to the topics they write about and
> produce
> a map of their interactions, identify multiple accounts held by a single
> author, and combine forum messages with internet relay chat (IRC) data
> sets.
>
> "We want to automate the whole process."
>
> Afroz said while the work appeals to law enforcements and government
> agencies, it is not designed to catch users out.
>
> "We aren't trying to identify users, we are trying to show them that this
> is
> possible," she said.
>
> To this end, the researchers released tools last year, updated last
> December,
> which help users to anonymise their writing.
>
> One tool, Anonymouth, takes a 500 word sample of a user's writing to
> identify
> unique features such as function words which could make them identifiable.
>
> The other, JStylo, is the machine learning engine which powers Anonymouth.
>
> The Drexel and George Mason universities research team is composed of Sadia
> Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon
> McCoy.
> --
> Unsubscribe, change to digest, or change password at:
> https://mailman.stanford.edu/mailman/listinfo/liberationtech
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/liberationtech/attachments/20130109/dd41a770/attachment.html>