Deal with no dependent feature in test data.Interpret results with logloss


I have build some models using logloss metric , I can see a result like this :slight_smile:

nIter  logLoss  
  11     0.5675282
  21     0.5544149
  31     0.5745408 

So, ok I am taking the smallest value.

I am using the
predict(mymodel, newdata=test_data)

and I am receiving something like:

no no no no no yes no no no no no no no no no no no no no no no no no no no no no no no no no no no ....

hence, the predictions.

I am not sure how to interpret the results.
My predictions are yes or no (I used that because the model demands a character and not a number (1 or 0)).
The logloss is the result from the model.

How can I finilize the results?
If the test_data contained the dependent variable , I would use a confusion matrix.
But again what about logloss?



With that script you are predicting the total dataset.

What you really want to predict is the last column wich have a categorical feature (0/1 or no/yes ) so i recomend you to use this script in order to predict only that row with a probability that minimazes the log loss using your model.

prediction <- predict(model, data.matrix(test[,-1]))

i hope it will help you, regards!

Hi and thanks for the answer.

I donโ€™t know what you are trying to do with using test[,-1].If you just ommit the first column which is the IDโ€™s, then ok, I have already dropped that when I use the test_data.

I have figured how to interpret the results.
You just need to add type="prob":

predict(model, test_data, type="prob")

and you have the probabilities!


I dropped the id before predicting. And whter adding type="prob" or not depends on your model. Im using xgboost but if you use randomforest the predictions are only 0 or 1 so you need to add that script.