Back to DrivenData | Blog

Data label errors

Hi,

I noticed some errors in train_labels.csv. There are many instances of gazelles/empty being labeled as lionfemale, for example:

SER_S1#B07#1#162
SER_S1#D03#7#4
SER_S1#D04#6#143
SER_S1#D05#5#312
SER_S1#D06#2#295

For lionfemale, cursory sampling of the labels suggests most of the labels are incorrect.

Did anyone else run into this?

Thanks,
O&S

SER_S1#B07#1#162 is images S1_B07_R1_PICT0483, 484 and 485 are a lioness in the dark.
SER_S1#D03#7#4 is images S1_D03_R7_PICT0010, S1_D03_R7_PICT0011 and S1_D03_R7_PICT0012 and a lioness walks in from the right.
SER_S1#D04#6#143 - tricky there might be something in the middle of the frameā€¦
SER_S1#D05#5#312 in isolation for me could be a lion - but a couple of minutes later is certainly a gazelle so probably unlikely.
SER_S1#D06#2#295 looks empty.
I guess the quality of the labelling is part of the challenge.

Yes, there are many, here are a few for S1 species with <20 sequences:
seq_id category correction
SER_S1#P07#1#282 rhinoceros buffalo
SER_S1#G02#1#53 genet hare
SER_S1#H11#1#2052 genet serval
SER_S1#N10#1#40 genet serval
SER_S1#H06#3#246 civet jackal
SER_S1#D05#5#430 waterbuck unsure (wb have round ears, see SER_S1#G03#3#136)
SER_S1#E09#2#386 waterbuck reedbuck
SER_S1#D04#6#201 bushbuck hippo
SER_S1#E06#4#4 honeybadger mongoose
SER_S1#K07#1#541 honeybadger guineafowl
SER_S1#N04#1#1026 honeybadger mongoose
SER_S1#N04#1#1096 honeybadger mongoose
SER_S1#N04#1#750 honeybadger mongoose
SER_S1#Q12#1#155 honeybadger mongoose
SER_S1#T13#1#1287 honeybadger mongoose
SER_S1#T13#1#769 honeybadger mongoose
SER_S1#U13#1#93 honeybadger mongoose
SER_S1#I02#1#595 caracal wildcat