The metric.py program

jimking100 · April 6, 2021, 3:20am

Hi,

I tried running a test submission using metric.py. It gave an error expecting trip_day_of_week and trip_hour_of_day, but the problem description says to leave those column out of the submission?

Also, I missed the Webinar, please let me know where I can access the recording.

Thanks,

Jim

isms · April 6, 2021, 3:28pm

Hi @jimking100 – you might want to make sure you are using the right version of metric.py and that you have passed the arguments in the right order (ground truth comes first, then submission). Try this:

❯ python runtime/scripts/metric.py data/ground_truth.csv data/submission_format.csv
2021-04-06 11:27:03.938 | INFO     | __main__:score_submission:405 - reading in submission from data/submission_format.csv
2021-04-06 11:27:03.943 | DEBUG    | __main__:score_submission:426 - binning submission
2021-04-06 11:27:03.953 | INFO     | __main__:score_submission:429 - reading in ground truth from data/ground_truth.csv
2021-04-06 11:27:12.801 | DEBUG    | __main__:score_submission:431 - binning ground truth
2021-04-06 11:27:24.004 | INFO     | __main__:score_submission:443 - initializing metric for epsilon=1.0 (100 rows of 200 in submission)
2021-04-06 11:27:24.005 | INFO     | __main__:__init__:133 - created working directory at /tmp/kmarginal
2021-04-06 11:27:24.006 | WARNING  | __main__:__init__:135 - found existing submission counts; removing
2021-04-06 11:27:24.006 | INFO     | __main__:score_submission:452 - starting calculation for epsilon=1.0
2021-04-06 11:27:24.006 | INFO     | __main__:overall_score:367 - computing k-marginals...
2021-04-06 11:27:24.007 | INFO     | __main__:_precompute_marginal_counts:222 - precomputing ground truth counts for each permutations ...
100%|██████████████████████████████████████████████| 57/57 [00:00<00:00, 63080.56it/s]
2021-04-06 11:27:24.013 | INFO     | __main__:_precompute_marginal_counts:230 - precomputing submitted counts for each permutation ...
100%|████████████████████████████████████████████████| 57/57 [00:00<00:00, 142.59it/s]
2021-04-06 11:27:24.413 | INFO     | __main__:k_marginal_scores:238 - running k-marginal count comparisons in parallel with None processes...
100%|█████████████████████████████████████████████████| 56/56 [00:00<00:00, 76.95it/s]
2021-04-06 11:27:25.293 | SUCCESS  | __main__:overall_score:370 - RESULT [KMARGINAL]: 0.0
2021-04-06 11:27:25.293 | INFO     | __main__:overall_score:372 - computing pickup-dropoff marginal...
2021-04-06 11:27:25.308 | SUCCESS  | __main__:overall_score:375 - RESULT [SPATIAL]: -2.220446049250313e-16
2021-04-06 11:27:25.308 | INFO     | __main__:overall_score:377 - computing higher order conjunction...
100%|█████████████████████████████████████████████████| 50/50 [00:21<00:00,  2.34it/s]
2021-04-06 11:28:03.366 | DEBUG    | __main__:higher_order_conjunction:358 - proportion errors for each of 50 iterations: [ 0.0201  0.0111  0.3282 -0.7993 -0.3087 -0.663  -0.4769 -0.5381  0.0429
 -0.4442 -0.4313  0.2447  0.0021  0.125  -0.692  -0.6532  0.5101  0.0091
 -0.8375  0.0226  0.3444  0.2015  0.0181  0.0293  0.2468 -0.741   0.0009
  0.12    0.2923  0.0125 -0.4459  0.0403  0.0686  0.0772  0.0328  0.0775
  0.0887  0.0029  0.0203 -0.4299 -0.3034 -0.301   0.0061 -0.5372  0.4503
  0.1861  0.1171  0.185  -0.6274  0.1271]
2021-04-06 11:28:03.378 | SUCCESS  | __main__:overall_score:380 - RESULT [HOC]: 0.734172244316548
2021-04-06 11:28:03.378 | SUCCESS  | __main__:overall_score:385 - RESULT [OVERALL]: 244.72408143884928
2021-04-06 11:28:03.378 | SUCCESS  | __main__:score_submission:459 - score for epsilon 1.0: 244.72408143884928
2021-04-06 11:28:03.379 | INFO     | __main__:score_submission:443 - initializing metric for epsilon=10.0 (100 rows of 200 in submission)
2021-04-06 11:28:03.380 | INFO     | __main__:__init__:133 - created working directory at /tmp/kmarginal
2021-04-06 11:28:03.380 | WARNING  | __main__:__init__:135 - found existing submission counts; removing
2021-04-06 11:28:03.381 | INFO     | __main__:score_submission:452 - starting calculation for epsilon=10.0
2021-04-06 11:28:03.381 | INFO     | __main__:overall_score:367 - computing k-marginals...
2021-04-06 11:28:03.382 | INFO     | __main__:_precompute_marginal_counts:222 - precomputing ground truth counts for each permutations ...
100%|██████████████████████████████████████████████| 57/57 [00:00<00:00, 57803.51it/s]
2021-04-06 11:28:03.383 | INFO     | __main__:_precompute_marginal_counts:230 - precomputing submitted counts for each permutation ...
100%|████████████████████████████████████████████████| 57/57 [00:00<00:00, 273.27it/s]
2021-04-06 11:28:03.592 | INFO     | __main__:k_marginal_scores:238 - running k-marginal count comparisons in parallel with None processes...
100%|█████████████████████████████████████████████████| 56/56 [00:00<00:00, 76.20it/s]
2021-04-06 11:28:04.509 | SUCCESS  | __main__:overall_score:370 - RESULT [KMARGINAL]: 0.0
2021-04-06 11:28:04.510 | INFO     | __main__:overall_score:372 - computing pickup-dropoff marginal...
2021-04-06 11:28:04.523 | SUCCESS  | __main__:overall_score:375 - RESULT [SPATIAL]: -2.220446049250313e-16
2021-04-06 11:28:04.523 | INFO     | __main__:overall_score:377 - computing higher order conjunction...
100%|█████████████████████████████████████████████████| 50/50 [00:12<00:00,  4.09it/s]
2021-04-06 11:28:34.301 | DEBUG    | __main__:higher_order_conjunction:358 - proportion errors for each of 50 iterations: [ 0.0201  0.0111  0.3282 -0.7993 -0.3087  0.337  -0.4769 -0.5381  0.0429
 -0.4442 -0.4313 -0.7553  0.0021  0.125  -0.692  -0.6532  0.5101  0.0091
  0.1625  0.0226  0.3444  0.2015  0.0181  0.0293  0.2468  0.259   0.0009
  0.12    0.2923  0.0125 -0.4459  0.0403  0.0686  0.0772  0.0328  0.0775
  0.0887  0.0029  0.0203 -0.4299 -0.3034  0.199   0.0061 -0.5372  0.4503
  0.1861  0.1171  0.185  -0.6274  0.1271]
2021-04-06 11:28:34.313 | SUCCESS  | __main__:overall_score:380 - RESULT [HOC]: 0.7556553167205631
2021-04-06 11:28:34.313 | SUCCESS  | __main__:overall_score:385 - RESULT [OVERALL]: 251.88510557352095
2021-04-06 11:28:34.314 | SUCCESS  | __main__:score_submission:459 - score for epsilon 10.0: 251.88510557352095
2021-04-06 11:28:34.314 | SUCCESS  | __main__:score_submission:464 - finished scoring all epsilons: OVERALL SCORE = 248.3045935061851 (per epsilon: {1.0: 244.72408143884928, 10.0: 251.88510557352095})

jimking100 · April 6, 2021, 3:56pm

I have used the correct version and I get the following error:

trip_day_of_week - expected column trip_day_of_week in data but it was not present;
trip_hour_of_day - expected column trip_hour_of_day in data but it was not present

In the metric.py code you have the following:

COL_TYPES = {
“trip_day_of_week”: “int8”,
“trip_hour_of_day”: “int8”,
“shift”: “uint8”,
“company_id”: “int8”,
“pickup_community_area”: “int8”,
“dropoff_community_area”: “int8”,
“payment_type”: “int8”,
“fare”: “int16”,
“tips”: “int16”,
“trip_miles”: “int16”,
“trip_seconds”: “int32”,
“trip_total”: “int16”,
}

This seems to indicate that trip_day_of_week and trip_hour_of_day should be in the submission.csv file yet the problem description says:

There are a couple differences between the ground truth data file and the synthetic file you will submit:

The submission file should include shift but not trip_day_of_week or trip_time_of_day, since shift is a summary of the other two variables used for evaluation. See the data section for details on how shift is computed.

So, could you please answer the question - are trip_day_of_week and trip_time_of_day supposed to be in the submission.csv or not?

isms · April 6, 2021, 7:39pm

No, they are not. The COL_TYPES is also used to read in the ground truth, it in no way implies that they must be in the submission.

(When a dtype argument is passed to pandas.read_csv it will only use such values as are found in the data and extra fields are ignored.)

jimking100 · April 6, 2021, 10:21pm

Great, I’ve got it working now, you’ve been very helpful. Thanks!

yuchaotao · April 16, 2021, 3:18pm

I also got this error. I turned on the option

--parameters-json ./data/parameters.json

and got this error. If this option is empty, then the error is gone.

The error is raised from this code snippet:

    if parameters_json is not None:
        logger.debug("validating submission ...")
        parameters = json.loads(parameters_json.read_text())
        logger.debug("checking that submission matches schema ...")
        TidyFormatKMarginalMetric._assert_sub_matches_schema(submission_df, parameters)
        logger.debug(
            "checking that submission meets length limits and has proper epsilons ..."
        )
        TidyFormatKMarginalMetric._assert_sub_less_than_limit_and_epsilons_valid(
            submission_df, parameters
        )
        logger.success("... submission is valid ✓")

So the error happens when the option --parameters-json is not empty. I guess it is the reason why the command python runtime/scripts/metric.py data/ground_truth.csv data/submission_format.csv runs successfully.

jimking100 · April 16, 2021, 3:36pm

Yes, that is the same behavior I observed. isms gave the example without the json, so I assume the json was from Sprint 2.

isms · April 16, 2021, 3:45pm

@jimking100 @yuchaotao You were right, validation wasn’t skipping those two unused columns — should be fixed now! My apologies, didn’t realize it was the --parameters-json that was the culprit.

Topic		Replies	Views
About the scoring of submission Differential Privacy Temporal Map Challenge	1	526	May 5, 2021
Final Submissions Due Sunday Differential Privacy Temporal Map Challenge	22	757	November 16, 2020
Submission issue Differential Privacy Temporal Map Challenge	4	436	April 27, 2021
Error submitting test submissions Sustainable Industry: Rinse Over Run	3	696	January 25, 2019
IMPORTANT: Regarding final submission write-ups Differential Privacy Temporal Map Challenge	2	505	May 16, 2021

The metric.py program

Related topics