Are the callbacks meant to trick us or am I confused?


I was about to post this:

I finished going through the benchmark blog post and noticed that the metric we are monitoring with val_loss, yet it is in mode="max". Shouldn’t it be “min” since a lower XEDiceLoss score indicates better performance?

But then I realized that val_loss is being logged as epoch_iou instead of xe_dice_loss. This is a mistake, right? We shouldn’t be calling our validation performance metric the validation loss? It ends up working out in the code since we do want to maximize iou, but it was a bit confusing to me when I saw it being called the loss.

1 Like

@jacquesthibs Thanks for your note. You are correct that the learning rate scheduler is conditioned on the validation metric (IoU) logged at the end of each epoch, which we seek to maximize. However, Pytorch Lightning used to require that we prepend the name of the metric being monitoring with val_ and conventionally expected val_loss (see the docs). We use the name val_loss here to avoid bugs, but acknowledge that IoU is not a loss function.

Ok, good to know. I do have a followup question then. Why are we using the metric to monitor rather than the loss function? Typically we would use the loss function, right?