Empty DATA folder

Hi!
I am using SNOTEL data but found that for 2023 (FY2023) the folder is empty(‘/code_execution/data/snotel/FY2023’) but the directory exists though. Should this not be already available. “wsfr_download” is also not available otherwise i could just use that as the endpoints are already approved. Moving ‘snotel.py’ to root causes "ModuleNotFoundError: No module named ‘stamina’.
Sorry if i missed some notice about mounting data or some special rules for accessing the approved data.
thanks
saket

My guess as to what is happening, right now the code is running with an issue_date of 2023-01-01. So the only applicable SNOTEL data is in 2022.

1 Like

Hi @saket,

The data is there and I am able to see it in my run:

2023-12-31 17:17:20.147 | INFO     | __main__:main:34 - Beginning code execution...
2023-12-31 17:17:20.147 | INFO     | __main__:main:36 - IS_SMOKE: False
2023-12-31 17:17:20.147 | INFO     | __main__:main:37 - src_directory: /code_execution/src
2023-12-31 17:17:20.147 | INFO     | __main__:main:38 - data_directory: /code_execution/data
2023-12-31 17:17:20.147 | INFO     | __main__:main:39 - preprocessed_directory: /code_execution/preprocessed
2023-12-31 17:17:29.313 | INFO     | __main__:main:58 - data_directory.iterdir returned results after waiting 9 seconds.
2023-12-31 17:17:29.313 | INFO     | __main__:main:71 - Running function 'preprocess'
/code_execution/data/snotel/FY2023
/code_execution/data/snotel/FY2023/1005_CO_SNTL.csv
/code_execution/data/snotel/FY2023/1008_MT_SNTL.csv
/code_execution/data/snotel/FY2023/1009_MT_SNTL.csv

...[removed for brevity]...

/code_execution/data/snotel/FY2023/992_UT_SNTL.csv
/code_execution/data/snotel/FY2023/998_WA_SNTL.csv
/code_execution/data/snotel/FY2023/999_WA_SNTL.csv
/code_execution/data/snotel/sites_to_snotel_stations.csv
/code_execution/data/snotel/station_metadata.csv

This is the code I used to generate those logs.

def preprocess(src_dir: Path, data_dir: Path, preprocessed_dir: Path) -> dict[Hashable, Any]:
    for path in sorted((data_dir / "snotel").glob("**/*")):
        print(path)
    raise Exception("Stop")

From looking at your job’s logs, I think your problem is that you are executing code on import rather than in the preprocess or predict functions. You should not put code that does any computation in the global namespace of scripts that are being imported. This means you’re running into the problem discussed in this thread.

We have a fix for that issue in the runtime—if you look at my logs, you’ll see there’s a line: 2023-12-31 17:17:29.313 | INFO | __main__:main:58 - data_directory.iterdir returned results after waiting 9 seconds. However, because your code is running immediately when your solution.py is imported, it’s trying to scan the data directory before the supervisor has had a chance to wait.

1 Like