Understanding train_labels time distribution

Hi ! train_labels’ datetime object Hour varies for different grids - but the competition’s page states: " datetime (string): The UTC datetime of the measurement in the format YYYY-MM-DDTHH:mm:ssZ . A value represents the average between 12:00am to 11:59pm local time. The datetime provided represents the start of that 24 hour period in UTC time."

Can someone help me reconcile the above - I was expecting all datetime objects to be tagged at 00:00:00 based on the above - but I see they are not. Taking the first row for example:
2018-02-01T08:00:00Z,3S31A,11.4
Can someone help me understand:

  • if 11.4 is an average
  • if so, what is the state date and end date of this average
  • how does 8 AM come into play here?

Many thanks for your guidance!
REgards, Eyas

Hello,

For the example line you give, the location grid ID is in Los Angeles; there, 08:00:00Z is midnight local time. I think that 11.4 represents the average for Feb. 2, 2018 (12:00AM to 11:59PM local time), but it is timestamped at what 12:00AM local time would be in UTC time, which is 8AM. The number would thus be the average from 2018-02-01T08:00:00Z to 2018-02-02T7:59:59Z, in the “datetime” format they are using.

At least, that is my understanding.

2 Likes

@Carl_Malings’ explanation is correct, thanks!

2 Likes

In the Labels (outputs), it has been mentioned: “The datetime provided represents the start of that 24 hour period in UTC time.”. So, in this example, the average should be from 2018-02-02T08:00:00Z to 2018-02-03T07:59:00Z. Could you please check?

I think that is almost correct. If the timestamp given is 2018-02-01T08:00:00Z, then the interval of the average is from 2018-02-01T08:00:00Z to 2018-02-02T07:59:00Z. The “start” refers to the start of the 24 hour period, lasting from midnight to 11:59PM in local time, but (in this time zone) from 8AM to 7:59AM the next day in UTC time.

I am having doubts because of another entry by cszc under a different query “clarification on features’ dates”. I copied the text here:

The label’s datetime represents the start of a 24 period over which the air quality is averaged. For example, a label with datetime 2019-01-31T08:00:00Z represents an average taken from 2019-01-31T08:00:00Z to 2019-02-01T07:59:00Z (inclusive).

Therefore, you can use satellite data with an endtime on or before 2019-02-01T07:59:00Z for a label with datetime 2019-01-31T08:00:00Z . Note that this is 11:59pm local time (pacific time).

This would be the correct answer, then. I think that is consistent with the explanation I gave, but maybe I am misunderstanding something or explaining poorly. Regardless, the explanation of cszc is the one to go with.

Not sure that I fully understand the source of confusion.

Let’s take the example from this thread:

  • 2018-02-01T08:00:00Z is the start of a 24 hour period from 2018-02-01T08:00:00Z - 2018-02-02T07:59:00Z.
  • This is equivalent to 2018-02-01T00:00:00 - 2018-02-01T11:59:00 local Pacific Time (UTC is 8 hours ahead of Pacific Time).
  • I think @Carl_Malings made a mistake earlier when he said it was February 2 - the local date is actually February 1.

I hope that clears things up!

Yes. It is clear now. Thanks.

1 Like

Yes, I did! Unfortunately I can’t seem to go back and edit that post.

This is a mistake; it should read “Feb. 1, 2018”

1 Like