Rows with phase=='final_rinse' and target_time_period==false

luiztauffer · January 12, 2019, 7:12pm

After removing the rows with target_time_period==true, the dataframe still keeps, for each process_id, at least one row with phase==‘final_rinse’, which by the description should not be used.

Shouldn’t target_time_period==true for all phase==‘final_rinse’?
Or are those remaining rows to be used?

The test_values does not contain any phase==‘final_rinse’.

twalen · January 12, 2019, 8:42pm

From what I understand up to now, the target period is part of final_rinse after the last closing of the caustic and acid valves (so it is the final portion of final_rinse time series).

In the problem description there is following note:

The target time period is the portion of the final rinse phase when the return caustic and return acid valves have been closed for the last time

luiztauffer · January 12, 2019, 9:04pm

Hi twallen, thanks for your response!

Yes, I understand. However, it sounds suspicious that we’re suppose to be using some rows with final_rinse for training but not for prediction (the test_values do not have a single row with final_rinse).

twalen · January 14, 2019, 8:50am

I agree that this is strange, but like you said, the lack of such non_target rows (from final_rinse) in the test set makes the usage of such rows almost impossible.

bull · January 14, 2019, 5:10pm

Hi @luiztauffer and @twalen,

Good questions—this is by design. We want to be able to predict the turbidity before the start of the final rinse phase (so it can be adjusted if needed). This means you are not provided any final rinse data for the test set. However, the turbidity that matters (which we want to predict) is only during the times marked target_time_period, which is not all of the final rinse.

For the training set, you are provided all of the observations, which you can split with whatever strategy works best.

Topic		Replies	Views
Duration of target time period in test phase Sustainable Industry: Rinse Over Run	4	808	January 25, 2019
Gaps from test data to target period? Sustainable Industry: Rinse Over Run	2	688	February 2, 2019
Asked to predict into the future Sustainable Industry: Rinse Over Run	4	835	January 16, 2019
Data not starting with the Pre-rinse Sustainable Industry: Rinse Over Run	7	1101	February 2, 2019
Restrictions for using test data for training Sustainable Industry: Rinse Over Run	5	1189	January 17, 2019

Rows with phase=='final_rinse' and target_time_period==false

Related topics