Can I use statistics computed over the whole training set in my solution?

Suppose I want to standardize the input subtracting the mean per pixel and dividing by the standard deviation. Can I use the the whole training set to compute them? (I guess yes) And can I use the the whole dataset with also the test set? (I guess no)

Using the training set it is technically a violation of the rule to not use information from future images but not for the test set so I think It is legit. In other words, I’m asking if this rule applies only to image for which a prediction is requested. Thank you.

Hi @Brasnold - Your guesses are correct. You can imagine that you are running the prediction on a real storm in real time. Your model could use all information gathered from the training set, but may only use images up to the point of prediction for the storm it is making an inference on in the test set.

1 Like