Empty DATA folder

Hi @saket,

The data is there and I am able to see it in my run:

2023-12-31 17:17:20.147 | INFO     | __main__:main:34 - Beginning code execution...
2023-12-31 17:17:20.147 | INFO     | __main__:main:36 - IS_SMOKE: False
2023-12-31 17:17:20.147 | INFO     | __main__:main:37 - src_directory: /code_execution/src
2023-12-31 17:17:20.147 | INFO     | __main__:main:38 - data_directory: /code_execution/data
2023-12-31 17:17:20.147 | INFO     | __main__:main:39 - preprocessed_directory: /code_execution/preprocessed
2023-12-31 17:17:29.313 | INFO     | __main__:main:58 - data_directory.iterdir returned results after waiting 9 seconds.
2023-12-31 17:17:29.313 | INFO     | __main__:main:71 - Running function 'preprocess'
/code_execution/data/snotel/FY2023
/code_execution/data/snotel/FY2023/1005_CO_SNTL.csv
/code_execution/data/snotel/FY2023/1008_MT_SNTL.csv
/code_execution/data/snotel/FY2023/1009_MT_SNTL.csv

...[removed for brevity]...

/code_execution/data/snotel/FY2023/992_UT_SNTL.csv
/code_execution/data/snotel/FY2023/998_WA_SNTL.csv
/code_execution/data/snotel/FY2023/999_WA_SNTL.csv
/code_execution/data/snotel/sites_to_snotel_stations.csv
/code_execution/data/snotel/station_metadata.csv

This is the code I used to generate those logs.

def preprocess(src_dir: Path, data_dir: Path, preprocessed_dir: Path) -> dict[Hashable, Any]:
    for path in sorted((data_dir / "snotel").glob("**/*")):
        print(path)
    raise Exception("Stop")

From looking at your job’s logs, I think your problem is that you are executing code on import rather than in the preprocess or predict functions. You should not put code that does any computation in the global namespace of scripts that are being imported. This means you’re running into the problem discussed in this thread.

We have a fix for that issue in the runtime—if you look at my logs, you’ll see there’s a line: 2023-12-31 17:17:29.313 | INFO | __main__:main:58 - data_directory.iterdir returned results after waiting 9 seconds. However, because your code is running immediately when your solution.py is imported, it’s trying to scan the data directory before the supervisor has had a chance to wait.

1 Like