On the code_execution_development data, the submission_format / test_labels are all on the same date. This essentially results in many of the weather data not to have various values, such as lightning_prob has 4 categories in the training data but only has 1 category on the test_set. As a result, when passed through pd.getDummies, the training and test example data no longer have the same number of columns.
This is easily fixable by concating both the test and training data and then separating them again. But for efficiency purposes, I am wondering if the actual test data will contain all the unique values that are in the training data provided. Just did not want to have to load in all the training data in the solution.py if its not necessary.