Labels not represented on train data but present on test data

I have just found the competition and I would like to have more information about this line:

Note also that some label classes are not represented in the training data , but may be present in the test set used for final evaluation.

My understanding is that materiales different than the following list may be present at the test set:

  1. Basalt
  2. Carbonate
  3. Chloride
  4. Iron Oxide
  5. Oxalate
  6. Oxychlorine (chlorate, perchlorate)
  7. Phyllosilicate
  8. Silicate
  9. Sulfate
  10. Sulfide

Is that correct?

Hi @ironbar, welcome to the Mars Spectrometry competition!

What you described is not quite correct.

The line that you quoted is intended to convey that for the subset of the training samples that come from the SAM Testbed, you will not find all 10 label classes represented. However, label classes that are not represented in the training set may still be present in the SAM testbed samples in the test set.

The 10 label classes listed are the exact same label classes required to be predicted on for the test set. You can see this in “Submission Format” section of the problem description page. There are no additional label classes that you will be required to predict on.

As an additional note: it is true in general that samples can contain materials that are different from the 10 label classes. The labels for this competition are not exhaustive compositional descriptions of the physical samples used to generate the data. Any compositional information about the samples outside of the 10 label classes is not specified in any of the competition data (train, validation, or test) and not expected to be part of your predictions.

I hope that cleared things up. Best of luck in the competition.

1 Like