I started to upload resized dataset to Kaggle because:
a) it’s almost impossible for anybody to deal with 5TB of data
b) it’s absolutely unnecessary to have 2048x1536 images for such a problem
Images will have the size of 512x384 (EXIF is preserved)
The list will be updated (upvotes are welcomed):
Thank you so much for sharing this! Anything that lowers the barrier to entry is epic
It looks like you have done a wonderful service to everyone here. Can I ask if you have documented the process, i.e., the algorithm(s) that you used to downsize the images? I assume that you reduced the resolution with some loss of information involved, and I guess if we train a model with the smaller images then we would probably want to duplicate that downsizing process with the test set as well.
sure thing https://gitlab.com/ppleskov/snapshot-serengeti/blob/master/resize.ipynb
key line is
img = img.resize((img.size//4, img.size//4), Image.ANTIALIAS)
As someone with 3Mb/s download limit I can now participate, thank you.
Thanks very much for this.
BTW, I don’t see season 7 or 9 when I use the kaggle API (kaggle datasets list), but I do see them in your links above. Not sure why – maybe they have to be registered or something to appear?
Thank you so much! Much appreciated
better to ask kaggle support
all data sets were produced in the same manner
First of all thanks for sharing!!
BTW, some files are missing I think. For ex. S8_Q09_R2_IMAG1456.jpeg is not present in season 8 part 5 here
May I know the reason?
I’m missing 400k images on the downsized dataset. I have yet to compare to original but wondering if others are seeing the same
a couple of images may be missing
it should be around 100 total