Anomaly Detection | Why not allowing raw anomaly scores submission?

Is there a specific reason for not allowing submitting the anomaly scores (e.g. probability in [0,1]) instead of binary detections? After all, in every case the threshold on anomaly scores should be tuned, but since there are no labels for the competition, the only way we have is to ‘guess’ an optimal threshold, which will anyway be biased on the subset used for the public leaderboard.

Wouldn’t it be useful to let competitors submit raw scores? or at least give an indication of the expected number of anomalies overall? Without this indication, it becomes a bit subjective to judge how critical a value should be to consider it an anomaly, and probably only experts in the domain can judge that accurately.