Data not starting with the Pre-rinse

twalen · January 14, 2019, 9:05am

What is the interpretation of the time series not starting with the pre-rinse (i.e. process_id=20005 starting with the acid phase)?

In my opinion there are at least two reasonable explanations (that require different approaches in modelling):

simply for those series there were no previous phases (i.e. pre-rinse, caustic phases)
there were previous phases, but for the sake of the competition they were not disclosed in the train/test set

Which interpretation is correct?

CarrieSmith · January 15, 2019, 6:35am

Also curious about missing phases.
Noticed there are also a lot of processes that have a pre_rinse and caustic then skips to final_rinse.
For example, processes 27939, 27844 and 20111.
Is this an aborted process?

ThomasF · January 21, 2019, 10:36am

Hello Twalen and CarrieSmith,

The order and number of phases is mainly depending of the recipe used, for normal cleaning (non-aborted cleaning) we can differentiate 3 main families :

Only 1 phase : rinse -> called flush, not super relevant here unless it’s a pre-rinse first (with recovery water) followed by a final rinse (with clean water);
3 phases : Pre-rinse -> Caustic -> Final rinse : this is a normal cleaning related to a short recipe
5 phases : Pre-rinse -> Caustic -> Intermediate rinse -> Acid -> Final rinse : this is normal cleaning related to long recipe

That’s pretty much everything in terms of usual recipes. Then there are un-usual cleanings (recipes that are less used) such as :

Cleanings starting directly with caustic phase or with acid phase. Although it’s less usual it’s still of interest to predict, if possible, the risk for presence of turbidity during final rinse phase.

Generally speaking, whatever the recipe and even if it’s an aborted cleaning, it would interesting to predict the amount of turbidity during final rinse so then final rinse phase can be fine-tuned accordingly.

lytkarinskiy · January 22, 2019, 5:09pm

Is it OK to check which phases and groups of phases exist in test set to adjust my model ? Does it violate rules not to use test data for predictions ?

Thanks!

ian-contiamo · January 24, 2019, 11:34am

Hello Thomas,

Is the recipe that will be used for a cleaning process known in advance, or is it adjusted based on the intermediate results from each phase?

More importantly: if the recipe is known in advance, is this information available in the test data?

ThomasF · January 24, 2019, 6:54pm

Hello Ian,

Yes the recipe is known when the cleaning starts.
The recipe contains seting parameters for each phases such as : temperature seting, flow seting, conductivity, and of course duration.
The recipe information is available in the Data provided to you (both training and test)

Thomas

ian-contiamo · January 25, 2019, 9:57am

Thank you for the information.

Just for clarification: are you saying that for the 1182 processes in test_values for which we have pre_rinse + caustic phases, there is a way to know whether the full recipe is:

pre_rinse + caustic + intermediate_rinse + acid + final_rinse, or:
pre_rinse + caustic + final_rinse

bull · February 2, 2019, 12:36am

Hi all, we just made an announcement releasing “recipes” that specify the phases you can expect (for the most part) for each process. Find the announcement here (you must be logged in to see this link):
https://www.drivendata.org/competitions/56/predict-cleaning-time-series/announcements/

Topic		Replies	Views
Are there missing phases in the test data Sustainable Industry: Rinse Over Run	7	955	February 17, 2019
Gaps from test data to target period? Sustainable Industry: Rinse Over Run	2	688	February 2, 2019
Asked to predict into the future Sustainable Industry: Rinse Over Run	4	835	January 16, 2019
Still a bit confused about the recipe_metadata Sustainable Industry: Rinse Over Run	2	647	February 9, 2019
Rows with phase=='final_rinse' and target_time_period==false Sustainable Industry: Rinse Over Run	4	936	January 14, 2019

Data not starting with the Pre-rinse

Related topics