Implementation of the bias penalty

In the calculation section of the problem description, the bias penalty is described as:

2C. Add a bias penalty (BP) of 0.25 if the sum of the raw, unzeroed privatized counts is more than 500 off from the ground truth for row i .

The diagram agrees with this: in the example they take counts [0, 2, 28, 20] and [26, 0, 2, 22], calculate the sum of each to be 50, and conclude that the absolute difference is 0.

In the implementation of the metric provided in the problem repo, it appears that the takes the sum of the term-wise absolute differences, the relevant line being:

bias_mask = np.abs(actual - predicted).sum() > self.allowable_raw_bias
(I couldn’t link to GitHub, but its line 35 in runtime/scripts/

So in the original example above this would be the sum:
|0 - 26| + |2 - 0| + |28 - 2| + |20 - 22| = 26 + 2 + 26 + 2 = 56
in this example it is still well below 500, but obviously this has significant ramifications when applied to the real data.

Which of these two metrics is the intended measure? If it is the implemented (L_1) measure then this puts a very tight constraint on the accuracy required: eg. with 178 incident types, on average each count has to deviate by less than 500/178 ~ 2.8.

Thanks for the keen eye @odaniel1! That is a bug in the implementation, we’ve fixed it in the runtime repo here:

It has also been fixed on the DrivenData platform and the submissions have been rescored. Looks like some slight changes to submission-level scores, but leaderboard is unchanged.

Let us know if you have any other issues!

Cheers @bull.

One smaller thing I spotted: the description of the metric on the Submissions page (sub-section Primary Evaluation Metric), as well as in the hover-text on the leaderboard page (hover over Best Public) state the metric to be:

PieChartJSD = ∑[1−max(0,𝖩𝖲𝖣i+𝖡𝖯i+𝖬𝖯𝖯i)]

This should be

PieChartJSD = ∑max(0,1 - 𝖩𝖲𝖣i+𝖡𝖯i+𝖬𝖯𝖯i)

or an alternative formulation; the important difference being that 𝖩𝖲𝖣i+𝖡𝖯i+𝖬𝖯𝖯i > 0 by definition so the clipping to [0,1] is not implemented in the first formula.

This has been updated too, thanks @odaniel1 :+1: