HateCheck Methodology
How HateCheck was constructed and how it can be used
Functional Tests for Targeted Model Evaluation
HateCheck covers 11 languages, with 25+ functional tests in each language.
Functional tests are a way of assessing whether a software system meets certain functional requirements. Such tests are widely used in software engineering and, more recently, in natural language processing.
The functional tests in HateCheck were selected based on an extensive literature review as well as interviews with civil society organisations. That way, we identified key challenges for existing hate speech detection models and incorporated them into the HateCheck test suites. For example, HateCheck tests model performance on counterspeech, which models often misclassify as hate.
Each functional test corresponds to a specific type of hateful or non-hateful content.
Hate is a subjective concept without a universal definition. For HateCheck, we define hate as abuse that is targeted at a protected group or at its members for being a part of that group. Protected groups are groups based on characteristics such as race, religion or sexual orientation.
Experts handcrafted test cases for each functional test, which were designed to be clear, simple and unambiguous. The handcrafted cases were then validated by trained annotators.
Testing Models with HateCheck
HateCheck can be used to test any hate speech detection model in the 11 languages it covers.
It is model-agnostic: you can compare different model architectures trained on different datasets, and even black-box commercial models.
HateCheck allows you to zoom in on model performance on particular types of hate, such as dehumanisation or derogation. Functional tests are created as contrasts, so that they are particularly challenging to overly simplistic models. For example, you can test model performance on both hateful and non-hateful uses of profanity.
HateCheck allows you to test how hate speech detection models perform across different target groups, and whether they might be biased in their target coverage. If models are worse at detecting hate aimed at some protected groups (e.g. women) than others (e.g. Muslims), this risks reinforcing biases in how different groups are protected in online spaces.
Start Using HateCheck
HateCheck is peer-reviewed, published at top academic conferences, and fully open-source. Head to our Download page to access HateCheck and read more details in our research papers.