Back to DrivenData | Blog

Submission format

Question re: required format for submission.

I’m saving my predictions as tiff images, 1024px a side, single band (shape = 1024x1024), max value 1, min 0, no floats. dtype int32 (tried using PIL and imageio save). All images zipped into (no directories). No luck getting a score so far - are we required to submit some sort of geo data along with the submission?

The error I get when submitting is:


All 11481 IDs are present, names as instructed eg: 0a0a36.TIFF

Could it be an image size image? Uncompressed, they are rather large (4.2MB each).

Perhaps an example of saving an array to a TIFF file could be provided?

Geodata isn’t required for prediction tiffs.

4MB per image file seems too large - they should be more like 10-20kb per 1024x1024 1-band chip in TIFF file format with a total zip archive filesize between 100-200mb.

Are all pixel values either 0 or 1 in all chips? And can you try saving as dtype uint8?

All pixels are either 0 or 1.

4MB was the uncompressed size - they compress down very small as expected.

Changing to uint8 before saving shrank the uncompressed to 1MB (~87MB for the whole zip after compression) and solved the problem described above - thanks for the help.

I then got a different error: “Found unexpected file ‘fe7ba8.TIFF’.” I used the naming convention from the problem description. Changing from .TIFF to .tif finally allowed me to submit, and get a score :slight_smile:

1 Like

Hey @johnowhitaker,

Awesome, glad the submission worked! The original submission timed out partly because of the large file sizes. Here’s an example of how we wrote small masks from numpy arrays using the Pillow library:

from PIL import Image 

# id of the tiff to write
file_id = 'abcdef'

# for now, just random data
arr = rng.choice(a=[0, 1], size=(1024, 1024), p=[0.95, 0.05]).astype(np.bool)

# use compression='tiff_deflate' for space savings
Image.fromarray(arr).save(f'{file_id}.tif', compression='tiff_deflate')

We’ve increased the scoring time limit to make it less likely this happens in the future and ensured that any variation of tif,tiff, TIF, and TIFF will work for the extension. Let us know if you have any future issues!



Thanks @bull! In addition to using PIL to save tiffs with deflate compression, here’s a quick gist showing use of rasterio to save prediction arrays as geotiffs with LZW, jpeg, or other supported compression. Again, submission are not required to have geodata but this would be a geospatial way to do it:

The notebook also shows a quick end-to-end workflow that you can drop your inference code into from loading test chips via STAC to packaging everything into the correct submission compressed .tgz file (verified via a LB submission that evaluated correctly).


@bull is it possible to add RLE based csv submission format ?

@SamSepiol nope, but feel free to share any issues you’re having with the archive of TIFF format in the forum and hopefully the community here can make that format work for you!


Similar problem with submission - total archive size ~50MB, during the calculation I get different errors of unexpected file
“Found unexpected file ‘d9dac.TIFF’.”
“Found unexpected file ‘68c85.TIFF’.”
What could be the problem?

Your submission contains files like d9dac.TIFF that are not in the test set from the data download page. You can verify that your submission has the correct files and only the correct files by comparing what is in your submission to the submission format (which you can also get on the data download page).

Another hint is that filename is only 5 characters (68c85), whereas all the actual test files are 6 characters (d9db56). I’d check your data loading and writing scripts to make sure you’re properly writing the filenames for your predictions that match the test data.

1 Like

I used your submission format :
Image.fromarray(arr).save(f’{file_id}.tif’, compression=‘tiff_deflate’)
to save the predictions to .tif files. The submission was succesfull but I obtained a rather low score (0.25). Based on earlier validation on training data, I suspect something must have gone wrong with the data format. When I open a tiff file (saved using the above mentioned line of code), I see values 0/255. What could have gone wrong?

Kind regards,

I see values 0/255.

This shouldn’t be a problem since we look at zero/non-zero values since these are segmentation masks. It’s common to see TIFs that look like this where the masks are 0 for the negative case and the max value of uint8 for the positive case.

It’s likely that your validation numbers are not calculated correctly or that your model is overfitting the training data. Hope that helps!

Thanks!, the problem was solved because I used a different format (not uint8)