The competitions uses some form of multi-class weighted Brier score.
My implementation seems to be a bit off (or it could be my cross-validation which is wack). Can someone see if there is something wrong:
# get w_c for the weighted Brier formula weights = np.genfromtxt('../data/class_weights.json', delimiter=',', skip_header=1, skip_footer=1, usecols=) # y is the activities annotations vector, e.g. [4,4,7,7] which we need to convert into a probability # matrix to compare with our predictions, so I one-hot-encode it # yp is our probabilistic predictions def brier_score(y, yp): from sklearn.preprocessing import OneHotEncoder yy = OneHotEncoder(, sparse=False).fit_transform(y[:, np.newaxis]) return (1./len(yy)) * np.sum(weights * ((yy-yp)**2))