HI Rishabh,
You get that error if you have numbers in your submission file that don’t appear with a decimal point “.”. So, if your submission is 1, 0, 1, 1, 0
you will see this error. If it is 1.0, 0.0, 1.0, 1.0, 0.0
you won’t see the error.
I’m not an R expert, but here is how I would force R to output a certain number of decimal places. If anyone has smarter code for this, please share.
# create our data, which is a random sequence of 0's and 1's
fakePredictions <- rbinom(20, 1, 0.5)
# turn these into a data frame so we can print them
predictionsDataframe <- data.frame(fakePredictions)
# However, these are integers. The below will return:
# [1] "integer"
typeof(predictionsDataframe$fakePredictions)
# So, we can force these to be doubles,
predictionsDataframe$fakePredictions <- as.numeric(predictionsDataframe$fakePredictions)
# Now typeof will return:
# [1] "double"
typeof(predictionsDataframe$fakePredictions)
# The parameter nsmall to the format function is the minimum number
# of digits to display after the decimal point
write.csv(format(predictionsDataframe, nsmall=2), "submission.csv")
More importantly, however, is that you’re probably not submitting the right values. This question asks for a probability in decimal format from 0 - 1. So, your predictions should look like 0.5, 0.2, 0.7
not just 0s and 1s. If you are submitting just 0s and 1s, you are predicting an outcome with 0% probability or 100% probability. The metric for this competition, Log Loss, puts a very high penalty on being confident and wrong, so your score will be quite bad.
Hope that helps, and good luck in the competition!
TLDR: If your submission is all 0s and 1s your score will be worse than no prediction (i.e., 0.5
for every value).