In the calculation section of the problem description, the bias penalty is described as:
2C. Add a bias penalty (BP) of 0.25 if the sum of the raw, unzeroed privatized counts is more than 500 off from the ground truth for row i .
The diagram agrees with this: in the example they take counts [0, 2, 28, 20]
and [26, 0, 2, 22]
, calculate the sum of each to be 50, and conclude that the absolute difference is 0.
In the implementation of the metric provided in the problem repo, it appears that the takes the sum of the term-wise absolute differences, the relevant line being:
bias_mask = np.abs(actual - predicted).sum() > self.allowable_raw_bias
(I couldn’t link to GitHub, but its line 35 in runtime/scripts/metric.py)
So in the original example above this would be the sum:
|0 - 26| + |2 - 0| + |28 - 2| + |20 - 22| = 26 + 2 + 26 + 2 = 56
in this example it is still well below 500, but obviously this has significant ramifications when applied to the real data.
Which of these two metrics is the intended measure? If it is the implemented (L_1) measure then this puts a very tight constraint on the accuracy required: eg. with 178 incident types, on average each count has to deviate by less than 500/178 ~ 2.8
.