For practice, I am trying to do this problem. So as per my understanding, my approach is -
1.dummy code the phase variable
2. get unique values of different stages group by process id …so I got 7 different stages here - 11111…11011…and so on
3. now I am grouping dataset based on these stages, meaning i will combine all rows into 1 dataset, that belong to 11111 phases i.e. all those process Ids that have gone through all 5phases are collected together in 1 dataset. and this way i have 7 different datasets
4. return flow values set to 0 if its negative
5. predict return_flow and then predict return_turbidity …and then calculate the final_turbidity by using the formula
is this approach correct? kindly guide.