Hi @tabumis, @riverbend,
Thanks for the feedback. In this challenge, Python is the primary language that is being supported. Solutions are allowed to use R in order to make the challenge accessible to more people. While R will not be supported to the same level as Python, the available resources and requirements should not be extremely limiting for R users.
- The feature data download code is provided as a command-line program. As a prerequisite, you will need to set up a Python virtual environment and install the package in order to use it, but you do not need to have any proficiency in reading or writing Python code to use it. There should be many resources online for different ways to install Python and set up a virtual environment—here’s a guide for setting up Python with conda. Instructions for setup and use of the data download program are in the README of the data and runtime repository.
- During code execution, rehosted feature data is available as files on disk in whatever raw formats (e.g., CSVs, netCDF, etc.). You can use whatever you’d like to read these data files. Use of the provided sample Python code is not required.
- For any data sources where direct API access is permitted during code execution, you may use any method you’d like to download data from the approved sources.
- The requirement to wrap your prediction code in Python is fairly lightweight. See below for a simple example of calling a
model.R
script from the requiredpredict
function in asolution.py
. You’d include both of these files together in yoursubmission.zip
.
We are happy to provide further tips to help you implement your solution, and to accept dependency requests for R packages that are available via conda.
## model.R
# Read site_id and issue_date from command-line arguments
args <- commandArgs(trailingOnly = TRUE)
site_id <- args[1]
issue_date <- args[2]
print("Printing from model.R")
print(paste("site_id:", site_id))
print(paste("issue_date:", issue_date))
# Calculate your predictions here
predictions <- c(100.5, 110.9, 120.4)
# Write predictions to a file so Python code can read them in
out_file <- paste0("preprocessed/predictions/", site_id, "_", issue_date, ".txt")
print(paste("Writing predictions to", out_file))
write(predictions, file = out_file, sep = ",")
## solution.py
import subprocess
from pathlib import Path
from typing import Any, Hashable
from loguru import logger
def predict(
site_id: str,
issue_date: str,
assets: dict[Hashable, Any],
src_dir: Path,
data_dir: Path,
preprocessed_dir: Path,
) -> tuple[float, float, float]:
logger.info("Logging from solution.py")
logger.info("Prediction for site_id={}, issue_date={}", site_id, issue_date)
logger.info("Using subprocess to call model.R via shell command.")
subprocess.run(("Rscript", "model.R", site_id, issue_date))
logger.info("model.R completed")
# Read text file containing "100.5,110.9,120.4", split on comma, cast to float
preds_path = preprocessed_dir / "predictions" / f"{site_id}_{issue_date}.txt"
logger.info("Reading predictions from {}", preds_path)
preds_text = preds_path.read_text()
preds = tuple(float(y) for y in preds_text.split(","))
logger.success("Successfully read predictions: {}",preds)
return preds
Here’s some example logging output from running this code:
2023-12-05 11:39:21.519 | INFO | solution:predict:15 - Logging from solution.py
2023-12-05 11:39:21.520 | INFO | solution:predict:17 - Prediction for site_id=hungry_horse_reservoir_inflow, issue_date=2015-03-15
2023-12-05 11:39:21.520 | INFO | solution:predict:19 - Using subprocess to call model.R via shell command.
[1] "Printing from model.R"
[1] "site_id: hungry_horse_reservoir_inflow"
[1] "issue_date: 2015-03-15"
[1] "Writing predictions to preprocessed/predictions/hungry_horse_reservoir_inflow_2015-03-15.txt"
2023-12-05 11:39:21.781 | INFO | solution:predict:21 - model.R completed
2023-12-05 11:39:21.782 | INFO | solution:predict:25 - Reading predictions from preprocessed/predictions/hungry_horse_reservoir_inflow_2015-03-15.txt
2023-12-05 11:39:21.783 | SUCCESS | solution:predict:29 - Successfully read predictions: (100.5, 110.9, 120.4)