General Approach to Flu Shot Learning Problem


I’ve participated in a couple of DrivenData competitions, and completed lots of modules on DataCamp, but they’ve only been regression or single label problems. Does anyone have some guidance on how to set up a multi-label process in Python? Build two algorithms, one for each prediction label, and combine the results…? Any help, or links to examples, would be appreciated. Thank you!

1 Like

Unfortunately I cannot tell you too much as I started this one only yesterday.
Have at look at:
sklearn.multioutput import MultiOutputClassifier

Thank you, I’ll have a look.

Hi @nathan, great question!

We’ve just published a new blog post this evening that has a walkthrough of our benchmark model for this competition. Hot off the presses!

The walkthrough also uses scikit-learn’s sklearn.multioutput.MultiOutputClassifier that @Slevin11 suggested. You can check it out as an example of how to use MultiOutputClassifier.

The approach that you suggested of building two models or two pipelines separately and then combining the results at the end also works. That will be one of the more “from scratch” way of doing it. Something like MultiOutputClassifier will simplify things for you, but also imposes constraints: in this case using the same estimator class for both label variables.

Good luck and have fun to both of you!


Hi Nathan, the MultiOutputClassifier is convenient since it simplifies the process and you can predict both labels in just one step. On the other hand predicting the two labels separately gives you more control and you may be able to fine tune the model for each case or even use two completely different models. It’s up to you.
If you use MultiOutputClassifier look at the competition benchmark or look here to have a clear understanding of the return value and which one you should pick

1 Like