log(0) is -infinity. How is this prevented when calculating the metric? Some sort of limit on the probabilities would work.In other words, zero probabilities are changed to some number very close to zero and probabilities of 1 are changed to some number very close to 1. Or am I missing something?

Youâ€™re absolutely right. In our implementation the probabilities are clipped to be either very close to 0 or very close to 1. This way no poor person scores infinity on a metric they are trying to minimize!

For those using the caret package in R, here is an implementation of a Log Loss function which can be used with the caret framework during cross validation. :

LogLoss <- function (data, lev = NULL, model = NULL)
{
probs <- pmax(pmin(as.numeric(data$T), 1 - 1e-15), 1e-15)
logPreds <- log(probs)
log1Preds <- log(1 - probs)
real <- (as.numeric(data$obs) - 1)
out <- c(mean(real * logPreds + (1 - real) * log1Preds)) * -1
names(out) <- c("LogLoss")
out
}

The Log Loss function is passed to caretâ€™s train function via the trainControl structure. Some code :
First create the trainControl structure, with the Log Loss function specified in the summaryFunction parameter

Then pass the created trainControl structure to the train function. Here you need to also specify the metric to be used for cross validation. The metric should match one of the names given to the output of the LogLoss function I posted previously.

model <- train(x= train, y=target, method="rf", trControl=tc,
metric="LogLoss", maximize=FALSE)

Hope that makes sense. If not, this caret documentation should do the trick(read the "Alternate Performance Metrics "section)