XGBoost Multilabel Classification

Hello everyone,

In the multilabel classification for predicting h1n1 and seasonal vaccine, one of the class probability is always predict as ‘always greater than 0.5’ (which means label 1). I am not sure why this is happening.

[0.8590254 0.43965816]
[0.86548555 0.42004538]
[0.8504172 0.53533596]
[0.84696984 0.40868753]
[0.84735835 0.43021488]
[0.64695126 0.75616175]
[0.8121675 0.57143956]
[0.77992 0.5903182 ]
[0.86847746 0.3757938 ]
[0.75904495 0.692877 ]
[0.72046316 0.7369681 ]
[0.8597982 0.42459705]

Any insights would be much appreciated.

Here’s the my model:

def fit_model(X_train, Y_train):
    clf = OneVsRestClassifier(XGBClassifier(n_jobs=-1,
                                            silent=0,
                                            verbose=True,
                                            # eval_metric = ["auc","error"],
                                            objective='multi:softprob',
                                            # nclasses=2,
                                            num_class=2,
                                            learning_rate=0.05,
                                            colsample_bylevel=0.20,
                                            colsample_bynode=0.20,
                                            colsample_bytree=0.20,
                                            min_child_weight=5,
                                            max_depth=12,
                                            subsample=1,
                                            n_estimators=100))
    mlb = MultiLabelBinarizer()
    Y_train = mlb.fit_transform(Y_train)
    clf.fit(X_train, Y_train)
    return clf

Try to use MultiOutputClassifier() instead of MultiLabelBinarizer()
https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputClassifier.html?highlight=multioutputclassifier#sklearn.multioutput.MultiOutputClassifier

1 Like