About the Runway Functions category

This category is for posts about the Runway Functions: Predict Reconfigurations at US Airports competition. Please keep all posts here specific to the competition.

How will we know if our application for the Prescreen area has been accepted? Is there a typical timeframe for review of these?

Thanks!

Hi @Jeremyblum , thanks for your patience! Now that the pre-screened arena is live, it should only take a few days to be approved.

1 Like

Hello all,

I’m new and have a question about the blog page. It starts out by saying:

You can use snippets from this notebook in your solution and you can use this algorithm as a starting place (but you don’t have to!).

from pathlib import Path
from typing import Sequence, Tuple

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

mpl.rcParams[“figure.dpi”] = 100
sns.set_style(“whitegrid”)
pd.options.display.width = 200

Can someone please help me identify what programming language this is? Is this python and if so is it the best one to use for programming our data? Thanks!

Hi @N222NY Welcome to the challenge! It is indeed Python; specifically the benchmark was written in a Jupyter notebook. For the Open Arena you’re welcome to use any programming language, but all prize-eligible submissions to the Prescreened Arena must be Python.

Thanks very much for the reply!

1 Like

I am an international student in the US. I’m interested in this data challenge. Can I participate? Or I should let my university supervisor be the group leader?

@pyt199519 It might be the case that your supervisor is eligible if they are a U.S. citizen or permanent resident and the following does not apply:

Federal employees acting within the scope of their employment and federally-funded researchers acting within the scope of their funding are not eligible to win a prize in this challenge.

Please see the Eligibility section of the Home page for more detail.

1 Like

Could you please reconsider allowing everyone to be eligible for the prize claim. This will allow a greater number of people participating on this challenge. I’m definitely positive that not a lot of scientists using this platform are U.S citizens nor affiliated with any U.S university.

What’s the point of having one or two people submitting alone. They don’t have to push their models to be best because of lack of competition.

I don’t know if this makes sense. Please reconsider. Thanks

1 Like

Thanks a lot for organizing this exiting challenge. I had three questions with regards to the submission in the prescreened arena for which I would highly appreciate your input:

1. prediction_time parameter: In the runtime repository I see that it is passed as the only parameter to main.py but it is not used anywhere in the code. Are we supposed to save in a csv only the airport / configuration / lookahead for that given timestamp or all the data in the submission template as in the benchmark code?

2. time of previous information: For how many days / hours before the prediction of prediction_time can we expect to have raw data at the moment of runtime?

3. relative path within submission.zip: Just like the data directory is referenced as Path("/codeexecution/data") to access the raw data in prediction time, how can we access relative paths within submission.zip, i.e. if we have main.py and a folder called src within submission.zip can we call Path("…/src/XXX")? - I am asking this point because there is a specification about not reading data from elsewhere in the documentation

Thanks in advance!

The prediction_time is provided, but you are not required to use it.

Each time your main.py runs, it should write out /codeexecution/prediction.csv that includes the airport/config/lookahead for that given timestamp. It should match the format of the partial submission format, which you can read from /codeexecution/data/partial_submission_format.csv, which is updated each time your main.py is run to contain only the current timestamp.

Good question; you can access any of the files (scripts, assets, etc.) that you include in your submission ZIP, so if you have a src directory in submission.zip that same directory will be accessible as ./src/XXX from the current working directory. I just updated the benchmark example submission to reflect that pattern, which I think will be useful. See src/utils.py and how it is referenced in the benchmark main.py.

You can find additional explanation in the Code submission format section of the README; hopefully it will be clear once you start in on it. Until then, we’ll be here to help, so keep the questions coming!

@putiki76 I understand your concern. The eligibility requirements were determined by the competition hosts with the goal of encouraging participation from US universities. There are no plans to change them at this time.

Thanks a lot for the fast and thorough response, everything super clear. The last doubt that I had was regarding the format of the new incoming raw data. Can we expect it to have the same format as the historical one (i.e. D_XXX_A_XXX for configurations, the gufi format separated by points, datatype of timestamps…) or shall we put tests and formatting for edge cases in our submission?

Thanks again.

Yes, you’re right to think about those complications! We will ensure that simple things like data format (configuration, GUFI, timestamp formats) match exactly. What is harder to ensure (and what your submissions should be able to handle) are cases where the data itself might be different. Here are a few of the possibilities:

  1. An airport in the training set is missing from the evaluation set. We might have to drop an airport if there are issues with data quality during the evaluation period.
  2. The distribution of configurations changes considerably. We will ensure that the set of configurations in the evaluation set matches that from the training period, but we can’t ensure that the distribution of how often those configurations are active is the same. I will say that if the distribution is wildly different, we could end up dropping that airport from the evaluation set (see possibility 1).
  3. Missing data. Let’s say you compute the number of arriving flights per hour as a feature. It is possible that there no flights are recorded for several hours (due to chance or a temporary reporting outage). Your code should be able to handle these kinds of cases of missing data.

Amazing, thanks for the detailed response. We will make sure to account for these cases.
In the runtime requirements it is mentioned that “The submission must complete execution in 8 hours or less”, these 8 hours are for how many queries to the main.py script, i.e. for how many time horizons?

Thanks again!

That will be for 1 week of data at 1 hour prediction time spacing so 7 days * 24 hours = 168 iterations where each iteration runs main.py once.

Forgot to reply to this: You will have access to raw data that begins 2 full days (48 hours) prior to the first prediction time, i.e. a 2 day “warm start” period. Note that you can use any or all of that data up to the prediction time for each prediction time. So for a prediction time on day 7, you can use the raw data from all 7 days before that plus the warm start period, so 9 days total.

Hi! If we were to use this as a school project (due after the competition ends), would we need to add our professor as a team member? Thanks!

Hi @Rachelcc To join the Prescreened Arena, you or a team member will need to submit proof of university affiliation. We do not require that a professor be a team member. Let me know if that addresses your concern; I may not have understood your question correctly.

Hi, thanks for your reply! We have an assignment in school to do a big group project. If we were to use our involvement in the contest as our assignment would that count as private sharing of code under the official rules? Would we have to add our professor to our team if we wanted to have him grade us on our code after the contest is over?