How to do pre-processing with spaCy or NLTK if they require external download?

Hi there everyone. My team is using spaCy for pre-processing the narratives before modelling, and we don’t really know how to make it work given we can’t download the pre-trained models on the fly in the competition environment. I’ve made a PR to the runtime repo to add the model we use, but I’m not sure it’ll be analysed on time.

Is there a way around that? Has anyone done pre-processing using these libraries but without needing any external dependencies?

Thanks!

I honestly should’ve asked this earlier, but we focused on developing the solution a bit too much (":

I think I might have found a solution to my own problem haha. Will try this out later, but if anyone’s confirmed a way around it I’d love to heard about it still!