I was about to post this:
I finished going through the benchmark blog post and noticed that the metric we are monitoring with
val_loss, yet it is in
mode="max". Shouldn’t it be “min” since a lower
XEDiceLoss score indicates better performance?
But then I realized that
val_loss is being logged as
epoch_iou instead of
xe_dice_loss. This is a mistake, right? We shouldn’t be calling our validation performance metric the validation loss? It ends up working out in the code since we do want to maximize
iou, but it was a bit confusing to me when I saw it being called the loss.
@jacquesthibs Thanks for your note. You are correct that the learning rate scheduler is conditioned on the validation metric (IoU) logged at the end of each epoch, which we seek to maximize. However, Pytorch Lightning used to require that we prepend the name of the metric being monitoring with
val_ and conventionally expected
val_loss (see the docs). We use the name
val_loss here to avoid bugs, but acknowledge that IoU is not a loss function.
Ok, good to know. I do have a followup question then. Why are we using the metric to monitor rather than the loss function? Typically we would use the loss function, right?