I’ve participated in a couple of DrivenData competitions, and completed lots of modules on DataCamp, but they’ve only been regression or single label problems. Does anyone have some guidance on how to set up a multi-label process in Python? Build two algorithms, one for each prediction label, and combine the results…? Any help, or links to examples, would be appreciated. Thank you!
The approach that you suggested of building two models or two pipelines separately and then combining the results at the end also works. That will be one of the more “from scratch” way of doing it. Something like MultiOutputClassifier will simplify things for you, but also imposes constraints: in this case using the same estimator class for both label variables.
Hi Nathan, the MultiOutputClassifier is convenient since it simplifies the process and you can predict both labels in just one step. On the other hand predicting the two labels separately gives you more control and you may be able to fine tune the model for each case or even use two completely different models. It’s up to you.
If you use MultiOutputClassifier look at the competition benchmark or look here to have a clear understanding of the return value and which one you should pick https://datascience.stackexchange.com/questions/22762/understanding-predict-proba-from-multioutputclassifier