What we should exactly predict?

First, when I see this competition I taught that we should just predict a score (a number between 0.0 and 1.0). But when I read the code submission format, I find that we are predicting result_images and score.

Can someone explain to me what we should exactly predict?


Please read these two pages which explain the competition task:

In particular, the Procedure for Test Inference section explains what your predictions should be and the expected format.