Back to DrivenData | Blog

Planetary Computer train/additional data consistency

Hi,

Thanks for the following tutorial, it helps to download additionnal bands. As we can see in this tutorial, there is a perfect match between train image and image downloaded through STAC API for chip_id=mtmo

However we can notice that datetime differs:

Train: 2020-04-30 08:43:07+00:00
STAC API (closest one): 2020-04-30 08:15:59.024000+00:00

How is it possible to have the same image (cloud locations) with around 30 minutes time shift? Clouds should have moved.

In the following thread we can read that:

Train datetime comes from Sentinel-2 meta data. It’s possible it won’t match exactly to the chip in the planetary computer.

So, can we rely on date time provided? Are they really comparable? A time range = 2x30min provided on search items is calibrated to make sure we can find an accurate match?

Update: 2x30min captures 93% of existing chips.

Thanks.

1 Like

Hi,

I am wondering about pulling in additional bands, if we use an extra band, then our model has 5 input channels, how would this work with the test data which would only have 4 input bands? Does that dimension not matter in the model or do we need to maintain the training data at 4 input bands? Thanks!

2 Likes

@MPWARE Good question!

Sentinel-2 imagery is scanned over a period of time rather than captured instantaneously. Timestamps are recorded based on the period of time it took to scan a larger area than the specific chip, called a tile. Neither timestamp is right v. wrong - they are both within the possible range of when a chip was captured, based on the scanning time for its larger tile.

We used the stac-sentinel library to generate our competition chips, and the timestamps provided are based on the SENSING_TIME metadata property. This is defined on page 299 of Sentinel-2 specs as:

This value is set to the average sensing time over the tile.

This means that we don’t have exact timestamp for each chip, but the times in STAC will be consistently close to the provided timestamps. I hope that helps!

2 Likes

@MPWARE One additional note. The datetimes for Sentinel-2 data in the Planetary Computer are PRODUCT_START_TIME, which is defined on page 346 of the specs as:

Actual User Product start time defined as the Sensing Time of the first line of the first scene in the product

This explains why the timestamp on the STAC API item you included is earlier than the competition chip, which uses SENSING_TIME as mentioned above.

2 Likes

Thanks @kwetstone!, I’ve also noticed that you’ve updated the tutorial with SENSING_TIME.

Another question: Can we use “SCL” asset? It sounds it gives access to an existing model.

For final additional bands consistency checks I’ve downloaded (STAC API) RGB bands and compared with RGB images train data (from Azure). Most match so it means that we’re safe to use additional bands.

However I’ve found some weird items, some train RGB images differs a bit from RGB image downloaded from STAC API. Date time match within 57s to 24min.

I would expect them to be the same.

Left: RGB (B04, B03, B02) Image from train
Middle: RGB Image from bands downloaded (B04, B03, B02) through STAC API
Right: “Visual” asset downloaded through STAC API

You can reproduce the problem on “qslg” with the tutorial by replacing:
example_chip = train_meta.sample(n=1, random_state=13).iloc[0]
by:
example_chip = train_meta[train_meta["chip_id"] == "qslg"].iloc[0]
Other chips with high distance: ['qslg', 'qtfc', 'qscm', 'qtgl', 'cmmt', 'qpal', 'qres', 'qqrx','rrpn']

Note: Some “visual” assets look different (normalization might be different) but RGB from initial bands should be the same like this one:

@MPWARE apologies for the slow response!

Another question: Can we use “SCL” asset? It sounds it gives access to an existing model.

Yes, you can use any available band from the planetary computer. See this page for a description of the algorithms behind this SCL band: Level-2A Algorithm - Sentinel-2 MSI Technical Guide - Sentinel Online - Sentinel Online

I’ve found some weird items, some train RGB images differs a bit from RGB image downloaded from STAC API. Date time match within 57s to 24min.

I’m not sure why the image derived from the STAC RGB bands appears slightly different from the competition data, but since the cloud cover looks to be the same it shouldn’t be an issue. That indicates that the STAC bands match the competition chip both geographically and temporally. As you suggested, it may be due to normalization or another preprocessing step that was applied to certain competition chips. I hope that helps!

1 Like