for a Better Internet
Why is HateCheck
We believe that great research is open research. That’s why we have open-sourced the data, code, and annotation guidelines for all of HateCheck. This means that anyone can reproduce our results, and use the HateCheck test suites to test and improve their own hate speech detection models.
Hate speech research should be open!
By making our research outputs readily available, we are making sure that our decision processes are also totally transparent.
We’ve even shared every label from the annotators who checked the HateCheck entries. This is especially important for socially sensitive and subjective tasks like hate speech detection, where there is no universal agreement on core concepts and definitions. The upshot is: if you are using HateCheck, you can understand in detail how the test suites were created, and evaluate what their strengths and weaknesses are for your application. Only when dataset creators make their assumptions and processes explicit, can others decide what the datasets should and shouldn’t be used for.
A call to action…
More and more researchers and developers are open-sourcing their work. Contributing to this movement is as easy as flagging a bug on GitHub or sharing your own code or data. If you’re a researcher, you can use frameworks like data statements or model cards to help you get started.
If you are using HateCheck, you can build and expand on our results, adapt the test suites to your needs, or even build new HateCheck test suites yourself. We are always happy to support open-source hate speech research, so if you have any questions or ideas, please get in touch!
Start using HateCheck!
Read the HateCheck research papers and
start testing hate speech detection models today.