A couple of question.
For providing model weights, is a pickle file sufficient?
Does the model freeze mean that we can’t rerun certain data processing transformations like:
Thanks,
Matt
@oshbocker Thanks for your questions.
1. Pickle: Pickle allows for arbitrary code execution on loading and therefore is not a safe format for sharing data. We recommend using the weight format that is specific to the framework you’re using (e.g. .hd5
, .ckpt
, etc.).
2. Transformations: Stateful feature transformations that have been fit
on training data cannot be refit after the model freeze period. At the time of inference, these transformations can only be applied to new input data (i.e., transform
).
1 Like
Thanks you for the answers @tglazer!
So if we are fitting Transformations on the incoming ground measures, that’s okay?
Is this an example of weight format specific to framework, example XGBoost?
xgb_reg.save_model(‘xgb_reg.txt’)
And then for evaluation code
xgb_reg.load_model(‘xgb_reg.txt’)
For sklearn it seems the recommended Model persistence is joblib or pickle: 9. Model persistence — scikit-learn 1.0.2 documentation
In my case, I am using sklearn
too so I have no option but to use pickle
or joblib
. I would be happy to use another, but I can’t find an alternative. Is it ok if you have no other option but to use pickle
?
@oshbocker If you are performing transformations entirely on approved incoming data sources, are not refitting after the model freeze period, and are not incorporating data past the day of estimation, then you should be okay using the transformations you mentioned.
@oshbocker @Galeros93 For sklearn, you can stick to the official recommendations from the docs for this framework and use pickle
or joblib
.
1 Like