Troubles with meters_id in Submission file

Hi All,

I apologize if my question is obvious, but I need help with the understanding of the submission file.
I am participating in Power Laws: Detecting Anomalies in Usage.

The rules says that "Competitors … should only submit the meters and times that appear in the submission format."
In submission_format.csv there are the following unique meters_ids:
[‘234_203’ ‘334_61’ ‘38_9686’]. But in the training data there are another unique meter_ids:
[ 2 863 869 872 875 878 881 884 887 890 896 902 911 920 925 928 930]

So as I understand, these meter_ids are different for train and submission. And I do feel like I miss something…
I would appreciate any help and suggestions from the community.

Kind regards,
Anton

First I’ve got the same problem here, but after a quick scan all seems fine:

cat train.csv | awk -F',' 'FNR > 1 {print $2}' | sort | uniq

the unique “meter_ids” in train.csv are:

  • 2
  • 234_203
  • 334_61
  • 38_0
  • 38_1
  • 38_10106
  • 38_10107
  • 38_10108
  • 38_10109
  • 38_10110
  • 38_10111
  • 38_10112
  • 38_10113
  • 38_10114
  • 38_10115
  • 38_10116
  • 38_10117
  • 38_10118
  • 38_10119
  • 38_10120
  • 38_10121
  • 38_10122
  • 38_10123
  • 38_10124
  • 38_10125
  • 38_10126
  • 38_2
  • 38_52306
  • 38_52322
  • 38_52323
  • 38_52324
  • 38_52325
  • 38_52326
  • 38_52327
  • 38_52328
  • 38_52329
  • 38_52332
  • 38_52333
  • 38_52375
  • 38_52379
  • 38_52467
  • 38_52468
  • 38_52469
  • 38_52470
  • 38_52471
  • 38_52472
  • 38_52473
  • 38_52474
  • 38_52475
  • 38_52476
  • 38_52477
  • 38_52478
  • 38_52479
  • 38_52480
  • 38_52481
  • 38_52482
  • 38_56030
  • 38_56031
  • 38_56032
  • 38_56033
  • 38_56034
  • 38_56727
  • 38_56728
  • 38_56729
  • 38_56730
  • 38_56731
  • 38_56732
  • 38_56733
  • 38_56734
  • 38_56735
  • 38_56736
  • 38_56737
  • 38_56738
  • 38_56739
  • 38_56740
  • 38_56741
  • 38_56742
  • 38_56743
  • 38_56744
  • 38_56973
  • 38_59654
  • 38_59804
  • 38_9678
  • 38_9679
  • 38_9680
  • 38_9681
  • 38_9682
  • 38_9683
  • 38_9684
  • 38_9685
  • 38_9686
  • 38_9687
  • 38_9688
  • 38_9689
  • 38_9693
  • 38_9694
  • 38_9695
  • 38_9696
  • 38_9697
  • 38_9698
  • 38_9699
  • 38_9702
  • 38_9703
  • 38_9704
  • 38_9705
  • 38_9706
  • 38_9707
  • 38_9708
  • 38_9709
  • 38_9710
  • 38_9711
  • 38_9712
  • 38_9713
  • 38_9714
  • 38_9715
  • 38_9716
  • 38_9717
  • 38_9718
  • 38_9719
  • 38_9720
  • 38_9725
  • 38_9726
  • 38_9727
  • 38_9728
  • 38_9729
  • 38_9730
  • 38_9731
  • 38_9732
  • 38_9733
  • 38_9737
  • 38_9738
  • 38_9739
  • 38_9740
  • 38_9741
  • 38_9742
  • 38_9743
  • 38_9747
  • 38_9748
  • 38_9749
  • 38_9751
  • 38_9752
  • 38_9753
  • 38_9754
  • 38_9755
  • 38_9756
  • 38_9757
  • 38_9758
  • 38_9759
  • 38_9760
  • 38_9761
  • 38_9762
  • 38_9763
  • 38_9764
  • 38_9765
  • 38_9766
  • 38_9787
  • 38_9788
  • 38_9789
  • 38_9790
  • 38_9791
  • 38_9792
  • 38_9793
  • 38_9794
  • 38_9795
  • 38_9796
  • 38_9797
  • 38_9798
  • 38_9799
  • 38_9801
  • 863
  • 869
  • 872
  • 875
  • 878
  • 881
  • 884
  • 887
  • 890
  • 896
  • 902
  • 911
  • 920
  • 925
  • 928
  • 930
  • 935
  • 938

In the submission file there are:

  • 234_203
  • 334_61
  • 38_9686

All of the IDs in the submission format also appear in the training data.

I suspect that at some point you cast the meter_id column as an integer, which discarded all of the IDs that have an underscore character in the ID (for example, 234_203). Treating meter_id as a string should resolve that issue.

Hope that helps!

Hi dlinke05,

Thank you for your help!

Anton

Hi bull,

Thank you for help and explanation! I will fix my code.
Anton