I tried running a test submission using metric.py. It gave an error expecting trip_day_of_week and trip_hour_of_day, but the problem description says to leave those column out of the submission?
Also, I missed the Webinar, please let me know where I can access the recording.
Hi @jimking100 β you might want to make sure you are using the right version of metric.py and that you have passed the arguments in the right order (ground truth comes first, then submission). Try this:
β― python runtime/scripts/metric.py data/ground_truth.csv data/submission_format.csv
2021-04-06 11:27:03.938 | INFO | __main__:score_submission:405 - reading in submission from data/submission_format.csv
2021-04-06 11:27:03.943 | DEBUG | __main__:score_submission:426 - binning submission
2021-04-06 11:27:03.953 | INFO | __main__:score_submission:429 - reading in ground truth from data/ground_truth.csv
2021-04-06 11:27:12.801 | DEBUG | __main__:score_submission:431 - binning ground truth
2021-04-06 11:27:24.004 | INFO | __main__:score_submission:443 - initializing metric for epsilon=1.0 (100 rows of 200 in submission)
2021-04-06 11:27:24.005 | INFO | __main__:__init__:133 - created working directory at /tmp/kmarginal
2021-04-06 11:27:24.006 | WARNING | __main__:__init__:135 - found existing submission counts; removing
2021-04-06 11:27:24.006 | INFO | __main__:score_submission:452 - starting calculation for epsilon=1.0
2021-04-06 11:27:24.006 | INFO | __main__:overall_score:367 - computing k-marginals...
2021-04-06 11:27:24.007 | INFO | __main__:_precompute_marginal_counts:222 - precomputing ground truth counts for each permutations ...
100%|ββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [00:00<00:00, 63080.56it/s]
2021-04-06 11:27:24.013 | INFO | __main__:_precompute_marginal_counts:230 - precomputing submitted counts for each permutation ...
100%|ββββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [00:00<00:00, 142.59it/s]
2021-04-06 11:27:24.413 | INFO | __main__:k_marginal_scores:238 - running k-marginal count comparisons in parallel with None processes...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 56/56 [00:00<00:00, 76.95it/s]
2021-04-06 11:27:25.293 | SUCCESS | __main__:overall_score:370 - RESULT [KMARGINAL]: 0.0
2021-04-06 11:27:25.293 | INFO | __main__:overall_score:372 - computing pickup-dropoff marginal...
2021-04-06 11:27:25.308 | SUCCESS | __main__:overall_score:375 - RESULT [SPATIAL]: -2.220446049250313e-16
2021-04-06 11:27:25.308 | INFO | __main__:overall_score:377 - computing higher order conjunction...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [00:21<00:00, 2.34it/s]
2021-04-06 11:28:03.366 | DEBUG | __main__:higher_order_conjunction:358 - proportion errors for each of 50 iterations: [ 0.0201 0.0111 0.3282 -0.7993 -0.3087 -0.663 -0.4769 -0.5381 0.0429
-0.4442 -0.4313 0.2447 0.0021 0.125 -0.692 -0.6532 0.5101 0.0091
-0.8375 0.0226 0.3444 0.2015 0.0181 0.0293 0.2468 -0.741 0.0009
0.12 0.2923 0.0125 -0.4459 0.0403 0.0686 0.0772 0.0328 0.0775
0.0887 0.0029 0.0203 -0.4299 -0.3034 -0.301 0.0061 -0.5372 0.4503
0.1861 0.1171 0.185 -0.6274 0.1271]
2021-04-06 11:28:03.378 | SUCCESS | __main__:overall_score:380 - RESULT [HOC]: 0.734172244316548
2021-04-06 11:28:03.378 | SUCCESS | __main__:overall_score:385 - RESULT [OVERALL]: 244.72408143884928
2021-04-06 11:28:03.378 | SUCCESS | __main__:score_submission:459 - score for epsilon 1.0: 244.72408143884928
2021-04-06 11:28:03.379 | INFO | __main__:score_submission:443 - initializing metric for epsilon=10.0 (100 rows of 200 in submission)
2021-04-06 11:28:03.380 | INFO | __main__:__init__:133 - created working directory at /tmp/kmarginal
2021-04-06 11:28:03.380 | WARNING | __main__:__init__:135 - found existing submission counts; removing
2021-04-06 11:28:03.381 | INFO | __main__:score_submission:452 - starting calculation for epsilon=10.0
2021-04-06 11:28:03.381 | INFO | __main__:overall_score:367 - computing k-marginals...
2021-04-06 11:28:03.382 | INFO | __main__:_precompute_marginal_counts:222 - precomputing ground truth counts for each permutations ...
100%|ββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [00:00<00:00, 57803.51it/s]
2021-04-06 11:28:03.383 | INFO | __main__:_precompute_marginal_counts:230 - precomputing submitted counts for each permutation ...
100%|ββββββββββββββββββββββββββββββββββββββββββββββββ| 57/57 [00:00<00:00, 273.27it/s]
2021-04-06 11:28:03.592 | INFO | __main__:k_marginal_scores:238 - running k-marginal count comparisons in parallel with None processes...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 56/56 [00:00<00:00, 76.20it/s]
2021-04-06 11:28:04.509 | SUCCESS | __main__:overall_score:370 - RESULT [KMARGINAL]: 0.0
2021-04-06 11:28:04.510 | INFO | __main__:overall_score:372 - computing pickup-dropoff marginal...
2021-04-06 11:28:04.523 | SUCCESS | __main__:overall_score:375 - RESULT [SPATIAL]: -2.220446049250313e-16
2021-04-06 11:28:04.523 | INFO | __main__:overall_score:377 - computing higher order conjunction...
100%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [00:12<00:00, 4.09it/s]
2021-04-06 11:28:34.301 | DEBUG | __main__:higher_order_conjunction:358 - proportion errors for each of 50 iterations: [ 0.0201 0.0111 0.3282 -0.7993 -0.3087 0.337 -0.4769 -0.5381 0.0429
-0.4442 -0.4313 -0.7553 0.0021 0.125 -0.692 -0.6532 0.5101 0.0091
0.1625 0.0226 0.3444 0.2015 0.0181 0.0293 0.2468 0.259 0.0009
0.12 0.2923 0.0125 -0.4459 0.0403 0.0686 0.0772 0.0328 0.0775
0.0887 0.0029 0.0203 -0.4299 -0.3034 0.199 0.0061 -0.5372 0.4503
0.1861 0.1171 0.185 -0.6274 0.1271]
2021-04-06 11:28:34.313 | SUCCESS | __main__:overall_score:380 - RESULT [HOC]: 0.7556553167205631
2021-04-06 11:28:34.313 | SUCCESS | __main__:overall_score:385 - RESULT [OVERALL]: 251.88510557352095
2021-04-06 11:28:34.314 | SUCCESS | __main__:score_submission:459 - score for epsilon 10.0: 251.88510557352095
2021-04-06 11:28:34.314 | SUCCESS | __main__:score_submission:464 - finished scoring all epsilons: OVERALL SCORE = 248.3045935061851 (per epsilon: {1.0: 244.72408143884928, 10.0: 251.88510557352095})
I have used the correct version and I get the following error:
trip_day_of_week - expected column trip_day_of_week in data but it was not present;
trip_hour_of_day - expected column trip_hour_of_day in data but it was not present
This seems to indicate that trip_day_of_week and trip_hour_of_day should be in the submission.csv file yet the problem description says:
There are a couple differences between the ground truth data file and the synthetic file you will submit:
The submission file should include shift but not trip_day_of_week or trip_time_of_day, since shift is a summary of the other two variables used for evaluation. See the data section for details on how shift is computed.
So, could you please answer the question - are trip_day_of_week and trip_time_of_day supposed to be in the submission.csv or not?
and got this error. If this option is empty, then the error is gone.
The error is raised from this code snippet:
if parameters_json is not None:
logger.debug("validating submission ...")
parameters = json.loads(parameters_json.read_text())
logger.debug("checking that submission matches schema ...")
TidyFormatKMarginalMetric._assert_sub_matches_schema(submission_df, parameters)
logger.debug(
"checking that submission meets length limits and has proper epsilons ..."
)
TidyFormatKMarginalMetric._assert_sub_less_than_limit_and_epsilons_valid(
submission_df, parameters
)
logger.success("... submission is valid β")
So the error happens when the option --parameters-json is not empty. I guess it is the reason why the command python runtime/scripts/metric.py data/ground_truth.csv data/submission_format.csv runs successfully.
@jimking100@yuchaotao You were right, validation wasnβt skipping those two unused columns β should be fixed now! My apologies, didnβt realize it was the --parameters-json that was the culprit.