Is post-processing of model predictions allowed ? Especially a metric like log loss leaves much scope for post processing, so just wanted to confirm if it breaks any rules.
Hi @devnikhilmishra, post-processing is allowed but must be compliant with the rule about processing test set observations independently. From the rules page:
Unless otherwise specified on the Competition Website, for the purposes of quantitative evaluation of Submissions, Participants agree to process each test data sample independently without the use of information from other cases in the test set. By default, this precludes using information gathered across multiple test samples during training, for instance through pseudo labeling. Eligible Submissions and models must be able to run inference on new test data automatically, without retraining the model.
So, for example, if you are doing calibration on the test data, you can use a calibration function that is fitted on the training and validation data. You should not use a calibration function that is fitted on the test data.
So can’t we use techniques like pseudo labeling?
@Loki_K Per the rules that I quoted above, you are not allowed to use pseudo-labeling that involves training on any test set data. Your model should not be fit on any test set data.