Stopping hate one tweet at a time

Challenge

The advent of social media has given rise to an unprecedented level of hate speech in public discourse. More tweets involving hate appear every year than those mentioning the Grammys, Super Bowl, and Major League Baseball. We wanted to find a way to stop this in its tracks.

Inspiration

What if the very tools that are being used to spread hate could also be used to put an end to it? We realized that we could use artificial intelligence to identify and process hate speech to zero in on the most hateful tweets and turn them against their authors.

Idea

To do this, we recognized something. Once you reply to a Twitter thread, your tweet remains in the thread forever, unless the tweet is deleted. So we devised a system that would identify hate speech using machine learning and then reply to the tweet, explaining that for every retweet it generated, we would make a donation to Life After Hate, a nonprofit dedicated to rehabilitating former white supremacists. This put potential spreaders of hate in a bind. Either they do not retweet and we win; or they do, and an organization they despise gets a donation.

How are we able to filter through millions of tweets to find the right ones to target? Getting there involved four separate data layers. We used Spredfast Intelligence to ingest a huge volume of tweets and filter them using 500 words that targeted religion, race, gender, ethnicity, and sexual orientation. We then applied machine learning in IBM Watson to classify the resulting tweets as hate speech. Next, we used Perspective API by Google to determine the level of toxicity contained in the copy. A human then made a final determination as to whether the resulting tweets should be targeted. After that, we used the Twitter API to determine the reach and retweets of the content.

Growth

#WeCounterHate radically outperformed our expectations of identifying hate speech, with a 91% success rate relative to a human moderator. And, thanks to machine learning, the model continues to improve.

The platform has also proved highly successful in suppressing hate speech. When it responds to a hateful tweet, it reduces its spread by an average of 54%, with 19% of the hate-tweeters deleting it outright. As a result, 4 million fewer people have been exposed to hate speech since the launch of #WeCounterHate, making it an extremely successful anti-media platform.

In addition, we are collecting insightful data about how hate speech propagates online, which will in turn enable experts in the field to address the problem at a more systemic level.

At a Glance

55 %

Reduction in spread of hate

At a Glance

20 Million

Potential impressions of hate tweets prevented

At a Glance

1 in 5

Countered hate tweets deleted by author

Awards

2019 Effie Awards

Positive Change

Gold

Press

The Power Of Purpose: How We Counter Hate Used Artificial Intelligence To Battle Hate Speech Online

Stopping hate one tweet at a time

A unique anti-media platform is turning racist, sexist, and homophobic tweets against their creators.

Challenge

Inspiration

Idea

Growth

At a Glance

At a Glance

At a Glance

Awards

Press

Forbes

Campaign US

VentureBeat

Life After Hate

A unique anti-media platform is turning racist, sexist, and homophobic tweets against their creators.

Challenge

Inspiration

Idea

Growth

At a Glance

At a Glance

At a Glance

Awards

Press

Forbes

Campaign US

VentureBeat

Life After Hate

Related Content

Breyers

Fireflies

Colgate

My Smile Is My Superpower