Is there a way to download only specific pages of the TIF files (instead of downloading everything and then selecting only the page we are interested in)?
The total size of the training dataset is about 1.5TB. If we have to download the full TIF files to only retain a downsampled image, that would be a large waste of bandwidth.
Could you provide a way to download only specific pages?
I have also problem downloading single files. When I run the example:
aws s3 cp s3://drivendata-competition-visiomel-public-us/images/1u4lhlqb.tif ./ --no-sign-request
I get:
[Errno 2] No such file or directory
I do not know what is wrong exactly?
@h.marko Handling large images is indeed part of the challenge of this competition. That said, you might look into partial reads with boto3, though I’m not sure offhand if that will support pyramidal tifs.
@majabedi I copied and pasted that command exactly and it works for me. Can you double check there is no typo in what you’ve run locally?
@h.marko, thanks for the response. I don’t even have the space to download such data. I would appreciate it if anyone could share a reduced version as well.
If you are constrained by space, you could just download one image at a time, save a single page (i.e. downsampled image) in a new file and remove the original file.