The current rules and evaluation methods unfortunately do not allow equal and fair competition for all competitors.
Firstly, only the best performing model from the public leaderboard should be selected for evaluation on the private test set instead of allowing multiple, or even worse all the submissions. This would prevent participants from fitting the private test set to find the minima of the log loss distribution and unfairly gaining an advantage over others. This would ensure that the competition is fair and that the best model is chosen based on its performance on the public test set.
Additionally, I would like to point out that the use of log loss as an evaluation metric puts participants at a disadvantage because it does not favor normal, scientifically meaningful submissions based for example on accuracy or ROC AUC. However, it can still be accepted that the log loss score was determined as best suited for the challenge for some reason, but in this case it’s essential that the distribution of both the public and private test sets is made public and available to anyone to ensure fairness and transparency.