Also curious about missing phases.
Noticed there are also a lot of processes that have a pre_rinse and caustic then skips to final_rinse.
For example, processes 27939, 27844 and 20111.
Is this an aborted process?
The order and number of phases is mainly depending of the recipe used, for normal cleaning (non-aborted cleaning) we can differentiate 3 main families :
Only 1 phase : rinse -> called flush, not super relevant here unless it’s a pre-rinse first (with recovery water) followed by a final rinse (with clean water);
3 phases : Pre-rinse -> Caustic -> Final rinse : this is a normal cleaning related to a short recipe
5 phases : Pre-rinse -> Caustic -> Intermediate rinse -> Acid -> Final rinse : this is normal cleaning related to long recipe
That’s pretty much everything in terms of usual recipes. Then there are un-usual cleanings (recipes that are less used) such as :
Cleanings starting directly with caustic phase or with acid phase. Although it’s less usual it’s still of interest to predict, if possible, the risk for presence of turbidity during final rinse phase.
Generally speaking, whatever the recipe and even if it’s an aborted cleaning, it would interesting to predict the amount of turbidity during final rinse so then final rinse phase can be fine-tuned accordingly.
Is it OK to check which phases and groups of phases exist in test set to adjust my model ? Does it violate rules not to use test data for predictions ?
Yes the recipe is known when the cleaning starts.
The recipe contains seting parameters for each phases such as : temperature seting, flow seting, conductivity, and of course duration.
The recipe information is available in the Data provided to you (both training and test)
Just for clarification: are you saying that for the 1182 processes in test_values for which we have pre_rinse + caustic phases, there is a way to know whether the full recipe is: