Data not starting with the Pre-rinse

What is the interpretation of the time series not starting with the pre-rinse (i.e. process_id=20005 starting with the acid phase)?

In my opinion there are at least two reasonable explanations (that require different approaches in modelling):

  • simply for those series there were no previous phases (i.e. pre-rinse, caustic phases)
  • there were previous phases, but for the sake of the competition they were not disclosed in the train/test set

Which interpretation is correct?

Also curious about missing phases.
Noticed there are also a lot of processes that have a pre_rinse and caustic then skips to final_rinse.
For example, processes 27939, 27844 and 20111.
Is this an aborted process?

Hello Twalen and CarrieSmith,

The order and number of phases is mainly depending of the recipe used, for normal cleaning (non-aborted cleaning) we can differentiate 3 main families :

  • Only 1 phase : rinse -> called flush, not super relevant here unless it’s a pre-rinse first (with recovery water) followed by a final rinse (with clean water);

  • 3 phases : Pre-rinse -> Caustic -> Final rinse : this is a normal cleaning related to a short recipe

  • 5 phases : Pre-rinse -> Caustic -> Intermediate rinse -> Acid -> Final rinse : this is normal cleaning related to long recipe

That’s pretty much everything in terms of usual recipes. Then there are un-usual cleanings (recipes that are less used) such as :

Cleanings starting directly with caustic phase or with acid phase. Although it’s less usual it’s still of interest to predict, if possible, the risk for presence of turbidity during final rinse phase.

Generally speaking, whatever the recipe and even if it’s an aborted cleaning, it would interesting to predict the amount of turbidity during final rinse so then final rinse phase can be fine-tuned accordingly.

Is it OK to check which phases and groups of phases exist in test set to adjust my model ? Does it violate rules not to use test data for predictions ?


Hello Thomas,

Is the recipe that will be used for a cleaning process known in advance, or is it adjusted based on the intermediate results from each phase?

More importantly: if the recipe is known in advance, is this information available in the test data?

Hello Ian,

Yes the recipe is known when the cleaning starts.
The recipe contains seting parameters for each phases such as : temperature seting, flow seting, conductivity, and of course duration.
The recipe information is available in the Data provided to you (both training and test)


Thank you for the information.

Just for clarification: are you saying that for the 1182 processes in test_values for which we have pre_rinse + caustic phases, there is a way to know whether the full recipe is:

  • pre_rinse + caustic + intermediate_rinse + acid + final_rinse, or:
  • pre_rinse + caustic + final_rinse
Hi all, we just made an announcement releasing “recipes” that specify the phases you can expect (for the most part) for each process. Find the announcement here (you must be logged in to see this link):

