It seems that there is a leakage in the data you’ve provided for this competition.
I’ve built a simple experiment to check this out:
- downloaded micro dataset
- extracted time of last file modification and file size
- used simple multiclass classifier with these two features
Such model gave me a score 0.061752 on the leaderboard. For comparison, a sample submission with average class probabilities gives 0.090614 on the leaderboard.
Let us know what do you think on this issue.