Error: IDs for submission are not correct

Screen Shot 2020-06-10 at 2.38.55 AM

My sub is in below format.

id,proba,label
1284,0.6146393418312073,1
1324,0.3828664720058441,0
1325,0.3949085474014282,0
1359,0.6110289692878723,1
1364,0.42948755621910095,0
1459,0.3576911985874176,0
1627,0.5911972522735596,1
1634,0.7979348301887512,1

All IDs match with the submission_format.csv downloadable from the data page. and there are 1000 rows only overall. All labels are 0 or 1. All 0<proba<1 as well.

su = pd.read_csv(“submission_format.csv”)
set(submission_df.id) == set(su.id)

gives true

Are you setting df.to_csv(’…’,index=False)?

You can also try setting the encoding

df.to_csv(file_path,index = False, header=True,sep=',',encoding='utf-8-sig') 

The submission file should be 1001 lines, counting the header.

Don’t forget the order of IDs matters.

1 Like

Yours IDs are not correct !! Your IDs should be from the train.jsonl !! All it will give Error…what you are getting so make prediction for the train.jsonl

try to replace the proba and labels inside submission_format,csv, if the order is not right you’ll get this error

I also got the same error. My output is as follows and the error message is not really useful.
Can anyone help on this?

id,proba,label
68394,9.92727705595442e-11,0
39867,6.556213211217354e-11,0
15097,4.209503822494298e-05,0
48916,4.6228714722929e-07,0
84652,8.170956966591092e-11,0
27496,1.0306415788363665e-05,0
19230,6.211123282362507e-11,0
48539,1.0358038732283248e-10,0
97423,6.917780837056853e-08,0
32451,6.979720978961268e-07,0
38401,1.8754791759867905e-15,0
92683,0.9999343156814575,1
89105,0.13309748470783234,0
47931,0.00013741306611336768,0

fixed the problem. indeed, order matters; have to be consistent to the order in test.jsonl.