This is a follow-up post to my first post on a recent project to model hate speech on Reddit. If you haven’t taken a look at my first post, please do!

I recently gave a talk on the technical, data science side of the project, describing not just the final result, but also the trajectory of the whole project: stumbling blocks, dead ends and all. Below is the slide deck, as well as the speaker notes. Enjoy!


Reddit is the one of the most popular discussion websites today, and is famously broad-minded in what it allows to be said on its forums: however, where there is free speech, there are invariably pockets of hate speech.

In this talk, I present a recent project to model hate speech on Reddit. In three acts, I chronicle the thought processes and stumbling blocks of the project, with each act applying a different form of machine learning: supervised learning, topic modelling and text clustering. I conclude with the current state of the project: a system that allows the modelling and summarization of entire subreddits, and possible future directions. Rest assured that both the talk and the slides have been scrubbed to be safe for work!


(Don’t forget to take a look at the speaker notes! They’re under Open speaker notes.)