I am not sure if the supervisor.py runs code using parallel processing but that may be what is giving me an error or if there is something up with the partial_submission_format.
I ran: print(partial_submission_format.tail()) and the log is shown in the image.
Please also not that this print was the first line of code actually running in my predict function.
When printing the tail from the partial submission format, you can see that up until the error, it prints the last 5 values of the dataframe. However, the last printed value for the tail before my error was an empty dataframe, which is causing my upload to throw an error.
Any advice on how to handle this?
Not sure if this will throw anything off but the only way I can think of so far would be to edit the supervisor.py so that it wont push empty partial_prediction_formats into the predict function.
Hi @axj65 , You’re right that sometimes the supervisor requests predictions for an empty partial predictions dataframe (this happens when no flights at a particular airport require predictions for that time). The simplest fix might be to have your solution’s predict function detect when partial_submission_format is empty and return the partial submission format.
def predict(
config: pd.DataFrame,
etd: pd.DataFrame,
first_position: pd.DataFrame,
lamp: pd.DataFrame,
mfs: pd.DataFrame,
runways: pd.DataFrame,
standtimes: pd.DataFrame,
tbfm: pd.DataFrame,
tfm: pd.DataFrame,
airport: str,
prediction_time: pd.Timestamp,
partial_submission_format: pd.DataFrame,
model: Any,
solution_directory: Path,
) -> pd.DataFrame:
"""Make predictions for the a set of flights at a single airport and prediction time."""
if len(partial_submission_format) == 0:
return partial_submission_format
# continue as usual
predictions = my_great_algorithm()
return predictions
I did already try to return the partial_submission_format for the empty formats. However, detecting that and then returning the partial submission format ended up forcing my predict function to return the original partial submission format, even for the parts where my code did work on the non-empty portions (not sure if this is because the supervisor.py is using parallel processing so if it detected one cluster that the format was empty, it would assumed my whole code had empty partial submission formats). I tried this using an if statement and tried a try-except clause. I did find a workaround, but it is quite inefficient. I had to add one nonempty row to my predictions and the partial_prediction_format. And then delete that row prior to returning the output so for nonempty formats, it added and removed the last row but for empty formats, the code would run on a one-row dataframe and delete that row to push an empty format.
I am wondering if something is up with the cloud computing system right now as it seems that my submissions seem to be stuck in ‘starting’ and not ‘running’.
Hi, there are no issues with the cluster–there’s a bit of a queue at the moment, but your submission is up next!
not sure if this is because the supervisor.py is using parallel processing
The supervisor does not use any parallel processing, so that shouldn’t be causing an issue.
I took a look at one of your recent submissions and saw a commented out part where you check for the empty partial submission format inside the clean_data function and return from that function, but then predict continues to run. The check should be directly within the predict function itself (the very first line should do it, see the above example), which will cause it to immediately return from the predict function without running further code. Let me know if that helps!
Thank you! I thought I put the if statement in the beginning of the predict function but just realized it was in my clean_data function. It did end up working.