The advent of social media has given rise to an unprecedented level of hate speech in public discourse. More tweets involving hate appear every year than those mentioning the Grammys, Super Bowl, and Major League Baseball. We wanted to find a way to stop this in its tracks.
Stopping hate one tweet at a time
What if the very tools that are being used to spread hate could also be used to put an end to it? We realized that we could use artificial intelligence to identify and process hate speech to zero in on the most hateful tweets and turn them against their authors.
To do this, we recognized something. Once you reply to a Twitter thread, your tweet remains in the thread forever, unless the tweet is deleted. So we devised a system that would identify hate speech using machine learning and then reply to the tweet, explaining that for every retweet it generated, we would make a donation to Life After Hate, a nonprofit dedicated to rehabilitating former white supremacists. This put potential spreaders of hate in a bind. Either they do not retweet and we win; or they do, and an organization they despise gets a donation.
How are we able to filter through millions of tweets to find the right ones to target? Getting there involved four separate data layers. We used Spredfast Intelligence to ingest a huge volume of tweets and filter them using 500 words that targeted religion, race, gender, ethnicity, and sexual orientation. We then applied machine learning in IBM Watson to classify the resulting tweets as hate speech. Next, we used Perspective API by Google to determine the level of toxicity contained in the copy. A human then made a final determination as to whether the resulting tweets should be targeted. After that, we used the Twitter API to determine the reach and retweets of the content.
#WeCounterHate radically outperformed our expectations of identifying hate speech, with a 91% success rate relative to a human moderator. And, thanks to machine learning, the model continues to improve.
The platform has also proved highly successful in suppressing hate speech. When it responds to a hateful tweet, it reduces its spread by an average of 54%, with 19% of the hate-tweeters deleting it outright. As a result, 4 million fewer people have been exposed to hate speech since the launch of #WeCounterHate, making it an extremely successful anti-media platform.
In addition, we are collecting insightful data about how hate speech propagates online, which will in turn enable experts in the field to address the problem at a more systemic level.
At a Glance
Reduction in spread of hate
At a Glance
Potential impressions of hate tweets prevented
At a Glance
Countered hate tweets deleted by author
2019 Effie Awards