Innovation Track Q&A – Monday 28th September

Hello wonderful solvers!

My name’s Will, and I’m Competition Director at altLabs, the sponsor for the Genetic Engineering Attribution Challenge.

We’re a little over halfway through the competition now, with just under four weeks to go until submissions to the Prediction Track are closed and submissions to the Innovation Track open. Since the Innovation Track has less of a standard format than the Prediction Track, now seems like a good time to open the floor to your questions, to help make sure people know what they need to do to compete in that track, and what to expect from the assessment process.

I intend to start answering questions on Monday, September 28th, at midday BST. I’m posting this here to enable people to think about what they’d like to ask, and to post their questions in advance if they feel like it. I plan to do most of my answering on the 28th; after that I’ll continue checking this thread periodically but it may take me longer to respond to new questions.

Please feel free to ask anything you like about the Innovation Track. You can also ask questions related to the Prediction Track, but I’ll prioritise those below Innovation Track questions.

Will you take the submission’s prediction track score into account when evaluating innovation track submissions?
Can there be multiple use-cases (like examples in the innovation track description) in the submission?
What if additional data used in the submission figures is very large?
The following questions are about both tracks: How strict are the requirements for the code? Are you going to run it as part of evaluation, or just review it? How much effort are you going to devote to fixing problems if running the code? Are there any restrictions on hardware? Machine learning code is often stochastic; do you expect a bit-perfect reproduction of the submission materials?

Thank you!

4 Likes

I was just wondering how the reports we produce should compare to an academic paper? Should we assume no subject knowledge at all in machine learning? And are there any examples you can point us to of the style of report you are expecting?

Thanks

1 Like

I was just wondering how the reports we produce should compare to an academic paper?

Good question.

At maximum 4 pages (including figures and references), the reports will be quite a bit shorter than (most) academic papers, especially given the lack of any Supplementary Information or similar. Because of this, it won’t be possible to go into the same level of detail you would in an academic paper – you’ll need to be sparing with both words and citations.

The report also needs to be written for experts with diverse backgrounds, which means less emphasis on technical detail and more on a high-level explanation of what your model can do and why it’s useful (though you can and should include technical details where they’re key to understanding the value of your approach). More on this later.

Aside from length and number of figures, there aren’t any ironclad formatting rules you need to follow in the report. That said, thinking of the report as a short academic paper provides some good guidelines: key external contributions should be cited, figures should have captions, et cetera. Devoting some of your first page to an abstract/summary would also be a very good idea.

Should we assume no subject knowledge at all in machine learning?

The judges reviewing a given submission will vary quite a bit in their ML knowledge; some will be subject-matter experts, others will have a cursory understanding, and others will have very little ML or statistics background.

This being the case, you’ll need to make sure your report is interesting/comprehensible to people both with and without an ML background. While you should make clear what techniques you used (and cite references for the most important), you don’t need to spend large amounts of space explaining the technical detail of how your techniques work. The most important thing is that you make it clear why you selected the techniques you did, and how they contribute to solving attribution problems that a non-ML expert might care about.

And are there any examples you can point us to of the style of report you are expecting?

Good question. I will get back to you about this one.

2 Likes

Will you take the submission’s prediction track score into account when evaluating innovation track submissions?

Fair question! We don’t want to provide the judges with detailed accuracy scores, both because we want assessment to be blinded and because we don’t want them to put too much weight on small differences in accuracy (which is what the prediction track is for).

On the other hand, accuracy is an important part of a holistic evaluation of a model, so we’ll provide judges with each submission’s approximate accuracy score.

Can there be multiple use-cases (like examples in the innovation track description) in the submission?

As long as you think you can cover them all in sufficient detail within the 4-page length limit, you can cover as many use cases as you like.

What if additional data used in the submission figures is very large?

This is fine, with the caveat that you’d need to find a way to provide us with the data when you submit the report.

If the additional data is open source (e.g. on a public database somewhere) and you can provide the accession, for example, that seems fine.

The following questions are about both tracks: How strict are the requirements for the code? Are you going to run it as part of evaluation, or just review it? How much effort are you going to devote to fixing problems if running the code? Are there any restrictions on hardware? Machine learning code is often stochastic; do you expect a bit-perfect reproduction of the submission materials?

DrivenData will be managing the code verification process, and will follow up with clear instructions for submitting code at the end of the challenge. We will be running the code, however, and it will need to be usable and reproducible to get a prize.

2 Likes

And are there any examples you can point us to of the style of report you are expecting?

Good question. I will get back to you about this one.

It looks like this is the first DrivenData competition to request this kind of report.

I think the closest analogue is probably a section from an academic paper (say, this one or this one), keeping in mind the disanalogies (more general readership, tighter space limitations) I discuss above.

1 Like