Is it random or some heuristic was followed to do the split?
You can find information about the splits in section 2.4 of the publication about the dataset.
Thanks, I did not have time yet to read the paper.
Using the various phases outlined above and after further filtering to remove low-quality examples,
we end up with a dataset totalling exactly 10k memes. The dataset comprises five different types
of memes: multimodal hate, where benign confounders were found for both modalities, unimodal
hate where one or both modalities were already hateful on their own, benign image and benign text
confounders and finally random not-hateful examples.
We construct a dev and test set from 5% and 10% of the data respectively, and set aside the rest
to serve as fine-tuning training data. The dev and test set are fully balanced, and are comprised of
memes using the following percentages: 40% multimodal hate, 10% unimodal hate, 20% benign text confounder, 20% benign image confounder, 10% random non-hateful.
So dev and test are balanced with that distribution, and train has an unknown distribution.