@organiser - Any chance to increase the limit to 3 submissions per day? We understand the rationale to throttle the number, but one submission per day does not enable the team to work efficiently across different time zones…
I think it will be increased later on near the deadline as is common with most competitions. They are limiting it to prevent spam submissions.
Avoiding spam submissions is reasonable. My opinion is that people tend to ramp up research and prototyping at the first 3 months then tuning for the rest 2 months plus writeup. 3 submissions per team per day won’t dramatically kill the eval server…
@maddog I understand that you would like to “test your program against the test set” a thousand times per day. So do I. So to get around the 1 submission per day limit do is this way:
- Preview a few hundred rows in the training set that have correct labels to get a gut feel for what they’re calling hate speech against a government protected class, vs “not”.
- Using the model in your head, go through the test set and label each one yourself using your best judgment. 70% of the time the answer is obvious on first glance. 30% of the time the confounders will have you questioning your own ideology and politics.
- When you finish labelling the set, you can now measure your algorithm against that set. Submit that and see how well you stack up.
Then finally you can test your algorithm against the test set, and you can “sort of submit” and see how well you’re doing a thousand times a day. But it’s only an approximation since the labels to the test set are permanently hidden from us. Which keeps things interesting.
Also if you’ve got a eagle eye, the test set and the training set isn’t drawn from the same data-sample. So we’ve got some jokers playing games with us. Which means the final stretch, the phase 2 hidden test set, will likely be different to what we’ve seen before, to separate models that are overfitting from those that have captured delicious signal.
@eleschinski I don’t agree, Eric. 3 times != thousands of times. We have different members in different time zones. Whoever in Australia doesn’t need to wait for a peer based in Germany to wake up and test the model performance.
Re: test strategy. I doubt that understanding the one offered by FB would help either. They are artificially generated (even though via a rigorous approach).