Dtype error in submission

rishabh_shukla · February 18, 2015, 10:31am

I am continuously getting a data type error in my submission. I am working with R and exporting a csv file as submission.

This is the error I am getting:
Unexpected data types in submission.
Expected dtypes: '[dtype(‘float64’)]'
Submitted dtypes: ‘[dtype(‘int64’)]’

I’ve checked the data types of the submission format file and they are similar to my output csv file. Not sure, what else is causing this. Any ideas about this?

bull · February 18, 2015, 1:25pm

HI Rishabh,

You get that error if you have numbers in your submission file that don’t appear with a decimal point “.”. So, if your submission is 1, 0, 1, 1, 0 you will see this error. If it is 1.0, 0.0, 1.0, 1.0, 0.0 you won’t see the error.

I’m not an R expert, but here is how I would force R to output a certain number of decimal places. If anyone has smarter code for this, please share.

# create our data, which is a random sequence of 0's and 1's
fakePredictions <- rbinom(20, 1, 0.5)

# turn these into a data frame so we can print them
predictionsDataframe <- data.frame(fakePredictions)

# However, these are integers. The below will return:
# [1] "integer"
typeof(predictionsDataframe$fakePredictions)

# So, we can force these to be doubles,
predictionsDataframe$fakePredictions <- as.numeric(predictionsDataframe$fakePredictions)

# Now typeof will return:
# [1] "double"
typeof(predictionsDataframe$fakePredictions)

# The parameter nsmall to the format function is the minimum number
# of digits to display after the decimal point
write.csv(format(predictionsDataframe, nsmall=2), "submission.csv")

More importantly, however, is that you’re probably not submitting the right values. This question asks for a probability in decimal format from 0 - 1. So, your predictions should look like 0.5, 0.2, 0.7 not just 0s and 1s. If you are submitting just 0s and 1s, you are predicting an outcome with 0% probability or 100% probability. The metric for this competition, Log Loss, puts a very high penalty on being confident and wrong, so your score will be quite bad.

Hope that helps, and good luck in the competition!

TLDR: If your submission is all 0s and 1s your score will be worse than no prediction (i.e., 0.5 for every value).

Topic		Replies	Views
Data Type issue	0	532	January 4, 2019
CSV Headers do not match Warm Up: Predict Blood Donations	2	3098	July 3, 2016
Submission page not uploading csv file Pover-T Tests: Predicting Poverty	4	820	February 7, 2018
Submission File Format: incorrect number of rows Warm Up: Predict Blood Donations	3	879	December 6, 2017
Right submission format Warm Up: Machine Learning with a Heart	1	551	August 6, 2019

Dtype error in submission

Related topics