@sinty Good question! As noted in the problem description, the test set is not guaranteed to follow the exact same distribution as the training set. We recommend designing your solution to handle variability in audio duration robustly, rather than assuming a strict 60-second maximum.