I’m using a Blackwell GPU for finetuning and noticing a bit of a gap (> 1 WER) when testing locally using the Docker container (same as runtime’s example, just with CUDA 12.8). Is the entire 9k smoketest scored when it is ran through the Docker container? Trying to determine if the gap is due to only part of the smoketest being scored or GPU architecture difference.