General Approach to Flu Shot Learning Problem

nathan · April 15, 2020, 7:41pm

Hi–

I’ve participated in a couple of DrivenData competitions, and completed lots of modules on DataCamp, but they’ve only been regression or single label problems. Does anyone have some guidance on how to set up a multi-label process in Python? Build two algorithms, one for each prediction label, and combine the results…? Any help, or links to examples, would be appreciated. Thank you!

Slevin11 · April 15, 2020, 8:04pm

Unfortunately I cannot tell you too much as I started this one only yesterday.
Have at look at:
sklearn.multioutput import MultiOutputClassifier

nathan · April 15, 2020, 8:28pm

Thank you, I’ll have a look.

jayqi · April 16, 2020, 12:24am

Hi @nathan, great question!

We’ve just published a new blog post this evening that has a walkthrough of our benchmark model for this competition. Hot off the presses!

https://www.drivendata.co/blog/predict-flu-vaccine-data-benchmark/

The walkthrough also uses scikit-learn’s sklearn.multioutput.MultiOutputClassifier that @Slevin11 suggested. You can check it out as an example of how to use MultiOutputClassifier.

The approach that you suggested of building two models or two pipelines separately and then combining the results at the end also works. That will be one of the more “from scratch” way of doing it. Something like MultiOutputClassifier will simplify things for you, but also imposes constraints: in this case using the same estimator class for both label variables.

Good luck and have fun to both of you!

adalseno · May 8, 2020, 11:54am

Hi Nathan, the MultiOutputClassifier is convenient since it simplifies the process and you can predict both labels in just one step. On the other hand predicting the two labels separately gives you more control and you may be able to fine tune the model for each case or even use two completely different models. It’s up to you.
If you use MultiOutputClassifier look at the competition benchmark or look here to have a clear understanding of the return value and which one you should pick https://datascience.stackexchange.com/questions/22762/understanding-predict-proba-from-multioutputclassifier

Topic		Replies	Views
Suggest allowing predictions to be both int or float Flu Shot Learning	0	33	February 4, 2025
XGBoost Multilabel Classification Flu Shot Learning	1	2492	February 16, 2021
Test Labels are missing Flu Shot Learning	2	818	July 29, 2021
The heart disease present Warm Up: Machine Learning with a Heart	1	649	May 10, 2019
Hints and Tricks: Which classifier have you used? Warm Up: Machine Learning with a Heart	0	1075	January 23, 2019

General Approach to Flu Shot Learning Problem

Related topics