0's and Logarithmic Loss Metric

log(0) is -infinity. How is this prevented when calculating the metric? Some sort of limit on the probabilities would work.In other words, zero probabilities are changed to some number very close to zero and probabilities of 1 are changed to some number very close to 1. Or am I missing something?

Happy mining

Hi Washier,

You’re absolutely right. In our implementation the probabilities are clipped to be either very close to 0 or very close to 1. This way no poor person scores infinity on a metric they are trying to minimize!

Happy mining to you!


If you are using R, this is the function I use to calculate Log Loss:

LogLoss <- function(actual, predicted, eps=0.00001) {
  predicted <- pmin(pmax(predicted, eps), 1-eps)


Thanks for that.

For those using the caret package in R, here is an implementation of a Log Loss function which can be used with the caret framework during cross validation. :

LogLoss <- function (data, lev = NULL, model = NULL) 
    probs <- pmax(pmin(as.numeric(data$T), 1 - 1e-15), 1e-15)
    logPreds <- log(probs)        
    log1Preds <- log(1 - probs)
    real <- (as.numeric(data$obs) - 1)
    out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
    names(out) <- c("LogLoss")

@washier, How do you use this with Caret?


The Log Loss function is passed to caret’s train function via the trainControl structure. Some code :
First create the trainControl structure, with the Log Loss function specified in the summaryFunction parameter

tc <- trainControl(method = "repeatedCV", summaryFunction=LogLoss,
                   number = 10, repeats = 1, verboseIter=TRUE, classProbs=TRUE)

Then pass the created trainControl structure to the train function. Here you need to also specify the metric to be used for cross validation. The metric should match one of the names given to the output of the LogLoss function I posted previously.

model <- train(x= train, y=target, method="rf", trControl=tc, 
               metric="LogLoss", maximize=FALSE)

Hope that makes sense. If not, this caret documentation should do the trick(read the "Alternate Performance Metrics "section)


Thanks @washier, I am trying it out now.

There’s also a mnLogLoss function in Caret that can be used

tc <- trainControl(method = “repeatedCV”, summaryFunction=mnLogLoss,
number = 10, repeats = 1, verboseIter=TRUE, classProbs=TRUE)


model <- train(x= train, y=target, method=“rf”, trControl=tc,
metric=“LogLoss”, maximize=FALSE)

Thanks Alex, exactly what I was looking for!!