Leakage in files metadata

oleg.panichev · October 20, 2017, 8:00am

Dear Organizers,

It seems that there is a leakage in the data you’ve provided for this competition.
I’ve built a simple experiment to check this out:

downloaded micro dataset
extracted time of last file modification and file size
used simple multiclass classifier with these two features
Such model gave me a score 0.061752 on the leaderboard. For comparison, a sample submission with average class probabilities gives 0.090614 on the leaderboard.

Let us know what do you think on this issue.

Thanks,
Oleg

ZFTurbo · October 20, 2017, 2:32pm

It probably can be not a leakage, but the feature of video encoder. Static videos basically has lower size. Videos with ‘blank’ label have larger probability to have smaller file size.

oleg.panichev · October 20, 2017, 3:11pm

Yes, you’re right. I’ve also tried out to build a model with last file modification feature only and it has higher performance comparing to baseline with average probabilities as well.

Topic		Replies	Views
Waiting for torrent Pri-matrix Factorization	16	1743	November 15, 2017
Chage Log of updated files Deep Chimpact: Depth Estimation Challenge	5	378	October 11, 2021
About the Pri-matrix Factorization category Pri-matrix Factorization	2	814	November 4, 2017
No discussion for this competition? Mapping Disaster Risk from Aerial Imagery	7	651	December 16, 2019
The limitation of log Video Similarity Challenge	9	240	February 5, 2023

Leakage in files metadata

Related topics