Back to DrivenData | Blog

Prediction format


#1

Hi all,

Got my first model up and running, but I’ve come to submit the results and I’ve run into trouble - apparently the submission is incorrect but I’m having a hard time finding the problem. I’m producing a dataset as requested in the Problem Description, I’ve made sure there are no nans/other spurious non-float values in the ‘poor’ column. Top couple of lines of my csv are:

id,country,poor
8,A,0.5856
65,A,0.9999
71,A,0.9573
80,A,0.0
…etc.

My file contains data for all 3 countries, I’ve sorted by country and then ID for my latest attempts to upload but I was just sorting by ID to start out. Is anyone else having issues? Am I just doing something really obviously wrong? Is it the decimal places? Is it the sorting? Any ideas would be really appreciated.

Thanks in advance, R


#2

I ran into this, and if I recall correctly, you need to have the id in the same order as the test set.

so it would be:

id,country,poor
418,A,0.5
41249,A,0.5
16205,A,0.5


#3

Thanks @2h2f
I wondered whether this was the case. I tried this but it was with my post-individual-join dataset, so I must have lost the original order on join. I’ll reconstruct tonight to pull out the original order.

@bull - if the ID order is fixed for submission, would it be possible to get a submission template? Might help others starting to output results.

Thanks again


#4

@rhyscoombs, sorry for the delay. It was an oversight by us that the template was not included in the data download page. We’ve corrected that and you should now be able to get the submission format template (with 0.5 for all predictions) here.


#5

The format asks for probabilities for poor listed by id and country. However, the measurement for success is described as mean log loss. These are two different metrics.

I’m getting a mean log loss < 0.0000000009 but I’m placing 247th…seems something is out of sorts with the automated scoring process. Thanks, Laura