Expanding HateCheck into More Languages
The Starting Point: English HateCheck
The original HateCheck introduced functional tests for English hate speech detection models. It includes tests for different forms of hate, such as derogation and threatening language, as well as tests for challenging non-hate, such as counterspeech. All HateCheck test cases were generated from templates across seven target groups and then validated by a team of trained annotators.
Creating German HateCheck
Expanding HateCheck into other languages required more than just literal translation of the English test cases. That is why we worked and consulted with native-speaking language experts to create Multilingual HateCheck (MHC). For the German version of HateCheck, for example, we included hate against Jewish people and refugees, because these forms of hate unfortunately are particularly prevalent in Germany. We also used language-specific phrases and idioms to create more realistic test cases.
Challenges and Chances
Some functional tests, like slur homonyms and reclaimed slurs, were too specific to individual languages like English to be transferred into other languages for MHC. Languages with non-Latin scripts like Arabic and Mandarin required the creation of entirely new functional tests for spelling variations. MHC highlights how crucial language-specific expert knowledge is, and it can serve as a blueprint for expanding HateCheck into even more languages. If you want to get involved, please get in touch!
Get In Touch!
If you want to get involved with expanding
HateCheck even further, please get in touch,
we’d love to hear from you.