Why a YouTube chat about chess was flagged for hate speech

Last June, Antonio Radić, the host of a YouTube chess channel with over a million subscribers, was live-streaming an interview with Grandmaster Hikaru Nakamura when the broadcast was suddenly cut off.

Instead of a lively discussion of chess openings, famous games, and iconic players, viewers were told that Radić’s video had been removed for “harmful and dangerous” content. Radić saw a message saying that the video, which contained nothing more outrageous than a discussion about the King’s Indian Defense, had violated the YouTube community guidelines. He was offline for 24 hours.

Exactly what happened is still unclear. YouTube declined to comment beyond saying that removing Radić’s video was a mistake. But a new study suggests it reflects shortcomings in artificial intelligence programs designed to automatically detect hate speech, abuse, and misinformation online.

Ashique KhudaBukhsh, a project scientist who specializes in artificial intelligence at Carnegie Mellon University and a serious chess player, wondered if the YouTube algorithm may have been confused by discussions of black and white pieces, attacks and defenses.

So he and Rupak Sarkar, an engineer at CMU, designed an experiment. They trained two versions of a language model called BERT, one using messages from the far-right racist website Stormfront and the other using data from Twitter. They then tested the algorithms on the text and comments of 8,818 chess videos and found that they were far from perfect. The algorithms marked about 1 percent of the transcripts or comments as hate speech. But more than 80 percent of those reported were false positives; read in context, the language was not racist. “Without a human in the know,” say the pair in their article, “relying on the predictions of standard classifiers in chess discussions can be misleading.”

The experiment exposed a central problem for AI language programs. Detecting hate speech or abuse is more than detecting obscene words and phrases. The same words can have a very different meaning in different contexts, so an algorithm must infer the meaning of a string of words.

“Basically the language is still very subtle,” says Tom Mitchell, a professor at CMU who has previously worked with KhudaBukhsh. “These kinds of trained classifiers will not be 100% accurate soon.”

Yejin Choi, an associate professor at the University of Washington who specializes in artificial intelligence and language, says she’s “not at all” surprised by YouTube’s removal, given the limits of understanding today’s language. Choi says that further progress in detecting hate speech will require large investments and new approaches. She says algorithms work best when they analyze more than just a piece of text in isolation, incorporating, for example, a user’s comment history or the nature of the channel the comments are posted on.

But Choi’s research also shows how detecting hate speech can perpetuate prejudice. In a 2019 study, she and others found that human annotators were more likely to label the Twitter posts of users who self-identify as African-American as abusive, and that algorithms trained to identify abuse using those annotations will repeat those biases.

item picture

The WIRED Guide to Artificial Intelligence

Super-smart algorithms won’t take all the jobs, but they’re learning faster than ever, doing everything from medical diagnostics to running ads.

Companies have spent many millions collecting and annotating training data for autonomous cars, but Choi says that not the same effort has been put into the language of annotations. Until now, no one has collected or annotated a high-quality data set on hate speech or abuse that includes many “extreme cases” with ambiguous language. “If we made that level of investment in data collection, or even a small fraction, I’m sure AI can do much better,” he says.

Mitchell, the CMU professor, says YouTube and other platforms probably have more sophisticated AI algorithms than the one KhudaBukhsh built; but even those are still limited.


Source link