Hello all!
A question regarding eliminating data from our model fit that wouldn’t be seen in test data.
The logic in the benchmark analysis was to drop a pipeline (L12) that was measured on training data but not found in testing data. We’ve noticed that for object_id there are 94 unique objects cleaned in the training data but only 88 listed in the testing data. Does it make sense to drop the 6 “non-shared” objects from the training data before training the model?