I am trying to explore the dataset for the understanding purpose, and I am looking at the way to find out memes having same image and different text.
Currently I am unable to see any pattern in the dataset to fetch out all the similar images with different text.
Any idea or information to find out such patterns without training machine learning model?
You can try extracting embeddings from a resnet and then do cosine similarity of all the embeddings and any embeddings that are very similar can be considered as same.
You could also try hashing the image and see if that helps. Maybe same images with different text give the same hash (Not really sure about this one)
I have not tried any above-mentioned approach but I feel like something similar can be done.