How about Pachyderm?

Hi there.
I read about Concept to Clinic challenge and I like this idea. I think that there were often lack of possibilities to apply cool machine learning models to actual life saving production. And it seems that this challenge tries to address it in “end-to-end” matter in context of cancer research and diagnostics.

Personally I have some amateur experience with machine learning and production experience with continuous integration and delivery. I just started to spent my learning time on gathering experience in reproducible research and getting familiar with Pachyderm. Pachyderm is platform for tracking multiple pipelines and data sources in as easy and as possible way, and preserving their changes in the same time. It is programming language agnostic and it is open source.

I’m thinking why couldn’t it be used in Concept to Clinic project. And I can’t find reasonable arguments against it ;). It should enable tracking changes of implementation of various proposed models and should help with tracking transformations of data. Documentation says that only requirements for Pachyderm is Kubernetes - either installed locally or on VPS, and there is even Enterprise edition if someone will like to use it in actual clinical production.

What do you think about it?

Thanks for the suggestion. As the challenge ramps up and people get familiar with the concept, we’re going to focus on reproducibility - more to come on that.

In the meantime, just want to clarify on one point:

if someone will like to use it in actual clinical production.

For challenge scope, the ML training pipeline is separated from the clinical use prototype. Everyone can assume that only wrapped, frozen models will ever be used in “production.”


I’m looking forward for people interested in looking into this idea :slight_smile:

1 Like