I understand this paragraph as a fraction of test processes that ends on particular phase.
But simple calculations on test_values.zip give me different percentages.
# test_data is the data from the test_values.zip
n = test_data['process_id'].nunique()
test_data.sort_values(by=['process_id', 'timestamp'])\
.groupby('process_id', as_index=False)['phase'].last()\
.groupby('phase').size() * 100 / n
Those percentages are approximate, with the true percentages being subject to a number of other constraints. We provide this estimate in case participants want to undertake a similar approach with the training data.