Anyone have a view on the top couple of leaderboard entries? I can conceive of some people getting a result in the 0.2 range, but the top 2 scores look distinctly odd, and I just don’t believe the value of 0 for the person in first place.
You’re correct to not believe that value. Since this dataset is taken from the UCI (public) repository, I am pretty sure that the number one on the leaderboard just grabbed the labels for the test set from there.
I agree with you. Values of the log-loss so low look suspicious. In principle you may have a perfect model on the training set, with a log-loss close to 0. However it is quite unlikely to have the same log-loss also applying the model to the test set.
There’s only 90 predictions to be made. With enough trial and error, you can sort of infer what the correct predictions are