Evaluation metric clarification


After reading problem description and evaluation metric section, it’s still unclear how IoU is calculated. In particular:

  1. IoU score can be calculated per each 512x512 tile independently and averaged across all test images.
  2. IoU score can be calculated per each flooding event and averaged afterwards.

In case of option 1, there are important to clarify edge cases, when target mask has no positive targets. What would IoU score for that tile would be, if model predicts no positive pixels and if it predicts some positive? I assume it’s 1.0 and 0.0 accordingly. But would be great if you can clarify this.

I wonder if it’s possible to include official scoring code to GitHub - drivendataorg/floodwater-runtime: Code execution runtime for the STAC Overflow: Map Floodwater from Radar Imagery competition?
Thanks in advance, Eugene

Hi @bloodaxe thanks for your question!

IoU is calculated at the pixel-level. It is defined as the sum of the intersection of water pixels divided by the sum of the union of water pixels across all images. Below is some pseudocode to demonstrate this calculation. We will add this to the competition problem description as a reference.

intersection = 0
union = 0

for pred, actual in file_pairs:
    mask = actual.ne(255) # get valid pixels
    actual = actual.masked_select(mask)
    pred = pred.masked_select(mask)

    intersection += np.logical_and(actual, pred).sum()
    union += np.logical_or(actual, pred).sum()

iou = intersection / union
1 Like

@bloodaxe We’ve added a script that makes this concrete: floodwater-runtime/metric.py at main · drivendataorg/floodwater-runtime · GitHub

There is an example in the README showing usage.

1 Like