What could possibly be going wrong with the submission?

I have tested my solutions locally as per the guidelines on the github repo. I am able to execute my code. I also tested the generated submission file with the scoring script.

When I submit the solution for smoke test, it runs successfully and generates a submission file. But then it fails with the message:

The submission file your code generated is not a valid submission.

What could possible by going wrong?
Any debugging tips?

Hi @mananjhaveri. I also ran into a lot of submission issues. I learned that sometimes the LLM can produce unexpected outputs. Therefore, I suggest that you should include code that compares the submission.csv file to the submission_format.csv file. These should be checked:

  1. The index order
  2. The number of columns - should 23 plus the “uid” column
  3. The column order
  4. All values should be of type int
  5. The range e.g. 0 or 1 for the binary columns, 1-6 for InjuryLocationType and 1-12 for WeaponType1
  6. Also, the file should be named: submission.csv
# These checks from the blog post assume that the "uid" column
# has been set as the index.

# Check that there are 23 features
assert df_preds.shape[1] == 23

# Check that index oder is the same
assert (df_preds.index == df_sample.index).all()

# All values should be integers
assert (df_preds.dtypes == int).all()

# Columns are in the correct order
assert (df_sample.columns == df_preds.columns).all().all()

# All columns are of type int
assert (df_preds.dtypes == int).all()

# Variables have values within the expected range
assert df_preds.iloc[:, 0:-2].isin([0, 1]).all().all()
assert (df_preds["InjuryLocationType"].isin(range(1, 7))).all()
assert (df_preds["WeaponType1"].isin(range(1, 13))).all()

I found that different LLMs tend to produce different errors. The good thing is that they produce the same errors consistently, therefore code can be use to post process the preds to fix the errors. It’s also important to set a seed.

1 Like