Dtype error in submission

I am continuously getting a data type error in my submission. I am working with R and exporting a csv file as submission.

This is the error I am getting:
Unexpected data types in submission.
Expected dtypes: '[dtype(‘float64’)]'
Submitted dtypes: ‘[dtype(‘int64’)]’

I’ve checked the data types of the submission format file and they are similar to my output csv file. Not sure, what else is causing this. Any ideas about this?

HI Rishabh,

You get that error if you have numbers in your submission file that don’t appear with a decimal point “.”. So, if your submission is 1, 0, 1, 1, 0 you will see this error. If it is 1.0, 0.0, 1.0, 1.0, 0.0 you won’t see the error.

I’m not an R expert, but here is how I would force R to output a certain number of decimal places. If anyone has smarter code for this, please share.

# create our data, which is a random sequence of 0's and 1's
fakePredictions <- rbinom(20, 1, 0.5)

# turn these into a data frame so we can print them
predictionsDataframe <- data.frame(fakePredictions)

# However, these are integers. The below will return:
# [1] "integer"

# So, we can force these to be doubles,
predictionsDataframe$fakePredictions <- as.numeric(predictionsDataframe$fakePredictions)

# Now typeof will return:
# [1] "double"

# The parameter nsmall to the format function is the minimum number
# of digits to display after the decimal point
write.csv(format(predictionsDataframe, nsmall=2), "submission.csv") 

More importantly, however, is that you’re probably not submitting the right values. This question asks for a probability in decimal format from 0 - 1. So, your predictions should look like 0.5, 0.2, 0.7 not just 0s and 1s. If you are submitting just 0s and 1s, you are predicting an outcome with 0% probability or 100% probability. The metric for this competition, Log Loss, puts a very high penalty on being confident and wrong, so your score will be quite bad.

Hope that helps, and good luck in the competition!

TLDR: If your submission is all 0s and 1s your score will be worse than no prediction (i.e., 0.5 for every value).