0's and Logarithmic Loss Metric

washier · January 19, 2015, 11:08am

log(0) is -infinity. How is this prevented when calculating the metric? Some sort of limit on the probabilities would work.In other words, zero probabilities are changed to some number very close to zero and probabilities of 1 are changed to some number very close to 1. Or am I missing something?

Happy mining

bull · January 19, 2015, 1:37pm

Hi Washier,

You’re absolutely right. In our implementation the probabilities are clipped to be either very close to 0 or very close to 1. This way no poor person scores infinity on a metric they are trying to minimize!

Happy mining to you!

Peter

BKR · January 21, 2015, 4:24am

If you are using R, this is the function I use to calculate Log Loss:

LogLoss <- function(actual, predicted, eps=0.00001) {
  predicted <- pmin(pmax(predicted, eps), 1-eps)
  -1/length(actual)*(sum(actual*log(predicted)+(1-actual)*log(1-predicted)))
}

washier · January 21, 2015, 8:52am

@BKR

Thanks for that.

For those using the caret package in R, here is an implementation of a Log Loss function which can be used with the caret framework during cross validation. :

LogLoss <- function (data, lev = NULL, model = NULL) 
{
    probs <- pmax(pmin(as.numeric(data$T), 1 - 1e-15), 1e-15)
    logPreds <- log(probs)        
    log1Preds <- log(1 - probs)
    real <- (as.numeric(data$obs) - 1)
    out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
    names(out) <- c("LogLoss")
    out
}

BKR · January 21, 2015, 9:18am

@washier, How do you use this with Caret?

washier · January 21, 2015, 10:49am

@BKR

The Log Loss function is passed to caret’s train function via the trainControl structure. Some code :
First create the trainControl structure, with the Log Loss function specified in the summaryFunction parameter

tc <- trainControl(method = "repeatedCV", summaryFunction=LogLoss,
                   number = 10, repeats = 1, verboseIter=TRUE, classProbs=TRUE)

Then pass the created trainControl structure to the train function. Here you need to also specify the metric to be used for cross validation. The metric should match one of the names given to the output of the LogLoss function I posted previously.

model <- train(x= train, y=target, method="rf", trControl=tc, 
               metric="LogLoss", maximize=FALSE)

Hope that makes sense. If not, this caret documentation should do the trick(read the "Alternate Performance Metrics "section)

BKR · January 22, 2015, 1:24am

Thanks @washier, I am trying it out now.

AlexPerrier · July 19, 2015, 2:23am

There’s also a mnLogLoss function in Caret that can be used

tc ← trainControl(method = “repeatedCV”, summaryFunction=mnLogLoss,
number = 10, repeats = 1, verboseIter=TRUE, classProbs=TRUE)

And

model ← train(x= train, y=target, method=“rf”, trControl=tc,
metric=“LogLoss”, maximize=FALSE)

abcdesai · March 28, 2017, 11:02pm

Thanks Alex, exactly what I was looking for!!

Topic		Replies	Views
First competition question Warm Up: Predict Blood Donations	4	2108	September 12, 2018
First Submission (Help Wanted) Warm Up: Predict Blood Donations	1	1339	February 17, 2017
With which precision do you calculate logloss? Countable Care	1	1825	March 18, 2015
Deal with no dependent feature in test data.Interpret results with logloss Warm Up: Predict Blood Donations	3	1044	June 22, 2017
Calculating the score Warm Up: Predict Blood Donations	2	744	April 14, 2023

0's and Logarithmic Loss Metric

Related topics