About the Runway Functions category

rbgb · March 14, 2022, 3:21pm

Aha, I understand now. No, it isn’t necessary to add your professor to your team. Sharing code with your professor for grading is allowed. The bit about sharing code is intended to prevent sharing among competitors.

alsaco · March 19, 2022, 7:01pm

Is it possible to see the full content of the logs in the submission to the prescreened arena? The DrivenData pane seems to cut the logs at 1000 lines and for the rest it yields the following: < … WARNING: logs capped at 1,000 lines; dropping 1,365 more >. It would be great to have access to the entire log file in order to see where it fails, it seems to be a time/memory issue since after different executions with the same code it fails at different timestamps and at about the 4h mark of execution saying: Your submission did not output the expected file so it could not be scored. Thanks!

rbgb · March 21, 2022, 2:00pm

@alsaco Done! I increased the max number of log lines to 5,000 so you should be able to see those later messages now.

just_curious · March 21, 2022, 6:39pm

Hello, I have a question about using data from the training set in the prescreened arena. Are we allowed to create csv files from the training data, include them in our submission.zip, and read from them during code execution? Thank you!

alsaco · March 21, 2022, 7:37pm

Thanks, much appreciated. I just wanted to double check that the runtime limit was 8h as specified in one of the comments above. After executing the code again and retrieving all the logs it failed at the 2h mark without any error being registered in the log file. Thanks again!

rbgb · March 21, 2022, 7:50pm

I don’t think there is any problem with that. If you plan on using that as your final submission, would you mind sending a little more detail about your strategy via email just to be sure (robert@drivendata.org)? Thanks!

rbgb · March 21, 2022, 7:51pm

My apologies, you are absolutely correct. That time limit was incorrectly specified in the cluster. Please try again, and your submission should get the full 8 hours.

alsaco · March 22, 2022, 2:15pm

Thanks for the change, it ran perfectly now. Could you confirm whether we will also get the 8h/week limit during the blind test period? Is there any uncertainty in the cluster’s capacity we should account for?, meaning that the same code has very different execution times based on network flow / time of day… . Thanks again!

rbgb · March 22, 2022, 2:25pm

Yes, you can count on the same 8h/week limit in the final evaluation period. We will use the same cluster as well.

I hear your point about compute requirements varying based on specifics about the features that are impossible to know definitively ahead of time. One important difference is that the final evaluation period is roughly 1 month, whereas the prescreened test set is 1 week. Do your best to assure that your solution won’t exceed the cluster capacity on the final evaluation set. We will try to be flexible if, for example, the final evaluation features are vastly more compute intensive than the prescreened test features.

pennswood · March 23, 2022, 7:26pm

Hi! I just wanted to double check that the log-loss score published in the prescreened arena is not weighted in some way (such as weighing the earlier lookahead more than the later ones from 30 to 360, or to a specific airport, etc). Prescreen is testing this over a whole week in prescreen arena and I find it a bit suspicious that the score I have currently on the leaderboard seems just a tad lower than any validation score I have gotten haha. Probably just some variance in the data and my algo but I just wanted to double check. Thanks!

pennswood · March 25, 2022, 5:53am

Can we assume that air traffic data will be sorted by timestamp during the prescreened testing period? Thanks!

rbgb · March 29, 2022, 8:45pm

I just wanted to double check that the log-loss score published in the prescreened arena is not weighted in some way (such as weighing the earlier lookahead more than the later ones from 30 to 360, or to a specific airport, etc)

Correct the open, prescreened, and final evaluation datasets all use the same loss metric. I would expect some deviation in scores between datasets due to a variety of factors. All of that is part of the challenge!

Can we assume that air traffic data will be sorted by timestamp during the prescreened testing period? Thanks!

Yes for the prescreened and final evaluation periods all of the features will be sorted by timestamp.

alsaco · March 29, 2022, 10:48pm

I just wanted to validate my understanding regarding point (1). If I understand correctly, some airports might be dropped from the evaluation and hence will not appear in the submission format csv but we will have data for them. Is it safe to assume that there will be data for all the airports available? i.e. past configurations, past weathers…

rbgb · March 30, 2022, 4:01pm

That’s right, we might say “you don’t need to predict for airport ABC since the data quality is poor” (i.e., ABC is not in submission format) but we will still include airport ABC features (past configurations, past weather, etc.).

just_curious · April 4, 2022, 8:02pm

Was there any noise added to the data in the Open Arena?

rbgb · April 4, 2022, 8:03pm

Nope, we did not add any noise.

pennswood · April 5, 2022, 2:58am

With new local build, I think there is an issue with typer and the latest click=8.1.0 (Add click 8.1.0 support by madkinsz · Pull Request #375 · tiangolo/typer · GitHub). In the next few days I think I might make a pull request to request just some data analysis packages into “environment-cpu.yml” in the repo. I was wondering if it would be more appropriate to update typer=0.4.1 (Add click 8.1.0 support by madkinsz · Pull Request #375 · tiangolo/typer · GitHub) or “lock” the click to a previous version? Thanks!

rbgb · April 5, 2022, 1:49pm

Runtime updates are a bit of a balancing act: we try to change as little as possible to avoid breaking code that previously ran, but we also don’t want there to be too many hurdles to development as a result of using old code versions. That said, I’m not too worried about a minor version update to typer (0.4.0 to 0.4.1 ) causing any problems, so I’d say go for it. Just a note: we like to keep the CPU and GPU environments as similar as possible, so try to make the same updates to environment-gpu.yml in your PR. Thanks!

pennswood · April 5, 2022, 7:31pm

Unfortunately I don’t have a GPU to test that enviroment (although I don’t particularly see a reason why it would fail). :’’)

rbgb · April 5, 2022, 7:44pm

Ah, that’s not a problem! You can make those changes to environment-gpu.yml anyway. When you make the PR, GitHub Actions will build the GPU image and run some automated tests to make sure it works :

Topic		Replies	Views
Getting started with the Runway Functions challenge Runway Functions	0	331	March 16, 2022
Preparing your final submission? We can help! Runway Functions	17	434	April 25, 2022
Prescrren arena eligibility Runway Functions	1	316	April 13, 2022
Potential virtual meet-up after the event is over? Runway Functions	0	306	April 18, 2022
The Prescreened Arena is ready for your code submissions! Pushback to the Future Challenge	2	277	April 10, 2023

About the Runway Functions category

Related topics