Evaluation Metric clarification for empty masks case

Sentinel-1 · January 26, 2022, 5:43am

Quite a few images do not have clouds at all and as a consequence their corresponding ground truth mask are also empty. As the evaluation metric for the competition is intersection divided by union it means if empty mask is correctly predicted, then the union is going to be zero, which leads to the division by zero. This is a quote from competition page:

The Jaccard index is a similarity measure between two label sets, A and B. It is defined as the intersection divided by the union and ranges between 0 and 1. The goal is to maximize this value.

How is this situation handled during scoring of submissions?

Correctly predicting that there is no cloud in the image should be rewarded rather than penalized I think, but it is unclear to me how such cases are actually handled in this competition. The competitions benchmark notebook is also started failing as soon as it has started to detect empty masks correctly when I slightly fine-tuned it.

MPWARE · January 26, 2022, 8:00am

If I’m not wrong they apply the scripts/metric.py provided in the docker image, IoU is computed on images set. Union can be zero for some images but not for full set (except if you predict empty mask for all).

def intersection_over_union(array_pairs, total=None):
    """Calculate the actual metric"""
    intersection = 0
    union = 0
    for pred, actual in tqdm(array_pairs, total=total):
        intersection += np.logical_and(actual, pred).sum()
        union += np.logical_or(actual, pred).sum()
    if union < 1:
        raise ValueError("At least one image must be in the actual data set")
    return intersection / union


def main(submission_dir: Path, actual_dir: Path):
    """
    Given a directory with the predicted mask files (all values in {0, 1}) and the actual
    mask files (all values in {0, 1}), get the overall intersection-over-union score
    """
    n_expected = len(list(actual_dir.glob("*.tif")))
    array_pairs = iterate_through_mask_pairs(submission_dir, actual_dir)
    logger.info(f"calculating score for {n_expected} image pairs ...")
    score = intersection_over_union(array_pairs, total=n_expected)
    logger.success(f"overall score: {score}")

Sentinel-1 · January 26, 2022, 9:14am

Yes, if it is computed on the entire dataset then technically it will work without problem, but predicting empty masks might be slightly underrated anyways. I have not seen this version of IoU implementation, thanks for sharing. I was referring to other implementation, namely the intersection_over_union() function which is provided in the benchmark solution. It is computed on every batch (not a full set) during training. The original batch size is 32 in the benchmark notebook - which is big enough to hide the problem, but I cannot use batch size 32 on my GPU (not enough memory) I use batch size 2 instead and whenever 2 images with no clouds are processed division by zero happens (producing inf value and breaking training). Just in case if anyone will run into the same problem I share below my workaround for zero division which I use for training, here are both, original and my update of the intersection_over_union() function from the benchmark:

Original:

def intersection_over_union(pred, true):
    """
    Calculates intersection and union for a batch of images.

    Args:
        pred (torch.Tensor): a tensor of predictions
        true (torc.Tensor): a tensor of labels

    Returns:
        intersection (int): total intersection of pixels
        union (int): total union of pixels
    """
    valid_pixel_mask = true.ne(255)  # valid pixel mask
    true = true.masked_select(valid_pixel_mask).to("cpu")
    pred = pred.masked_select(valid_pixel_mask).to("cpu")

    # Intersection and union totals
    intersection = np.logical_and(true, pred)
    union = np.logical_or(true, pred)
    return intersection.sum() / union.sum()

My Update (avoids division by zero):

def intersection_over_union(pred, true):
    """
    Calculates intersection and union for a batch of images.

    Args:
        pred (torch.Tensor): a tensor of predictions
        true (torc.Tensor): a tensor of labels

    Returns:
        intersection (int): total intersection of pixels
        union (int): total union of pixels
    """
    valid_pixel_mask = true.ne(255)  # valid pixel mask
    true = true.masked_select(valid_pixel_mask).to("cpu")
    pred = pred.masked_select(valid_pixel_mask).to("cpu")

    union = np.logical_or(true, pred).sum()
    if union == 0.0:
        # Avoid division by zero, but still reward correct predictions and penalize bad ones:
        return (union + 1) / (pred.sum() + 1)
    intersection = np.logical_and(true, pred).sum()
    return intersection / union

Topic		Replies	Views
Metric Calculation Segmenting Buildings for Disaster Resilience	3	635	March 2, 2020
The process of qualitative evaluation On Cloud N	1	343	February 8, 2022
No discussion for this competition? Mapping Disaster Risk from Aerial Imagery	7	651	December 16, 2019
Evaluation metric clarification STAC Overflow: Map Floodwater from Radar Imagery	2	597	August 16, 2021
Bad masks (ahfi, cutk, iboy ...) On Cloud N	9	931	January 21, 2022

Evaluation Metric clarification for empty masks case

Original:

My Update (avoids division by zero):

Related topics