How many test images are missing imagery?

jon.crall · January 15, 2023, 9:17pm

I’m double checking that the data I’ve downloaded is correct, and I’d like to verify that the number of test regions that don’t have any associated sentinel or landsat imagery is roughly what is expected.

When using 240x240 meter box centered at the annotated point, using a time range of 60 days before up to the annotated time for the test regions I’m able to pull at least one image for 4713 / 6510 test rows, leaving 1797 test rows without any image data.

Is this a reasonable number of missing test regions? Or am I messing up my queries somehow?

jon.crall · January 16, 2023, 3:34pm

The issue was my cloud cover filter. I had it set to less than 10% (which I believe applies to the entire image, not just the area of interest) the number of invalid items reduced to 445 when I loosened the threshold to 40% and then to 42 when I loosened it to 100%.

Still strange that some items in the test set don’t have available images to make predictions with, but this amount of missing data is at least workable.

kwetstone · January 23, 2023, 4:04pm

Hi @jon.crall . In our experiments, we were able to get either Landsat or Sentinel-2 satellite data for every sample in the data. You can see our code for pulling satellite data in the benchmark blog post.

I believe we broadened our search by adjusting the date range and/or the size of the bounding box around the point of interest.

Feel free to follow up with other questions, and good luck!

astra92293 · January 26, 2023, 11:43am

Hi! We have a condition for Landsat, but your benchmark’s gathering data from Landsat 7 as well, does it mean we can use Landsat 7 in case data from Landsat 8-9 are missing?
For example, sample: evep, 44.847993, -93.476318, 2013-01-04 doesn’t have items for data range <= 2013-01-04

There have been many Landsat missions since the original launch in 1972. The competition data only goes back to 2013, so participants should only use Landsat 8 and Landsat 9. Participants may not use any previous Landsat missions. Landsat 8 and Landsat 9 satellites are out of phase with one another, so that between the two each point on the Earth is revisited every 8 days. The data collected by Landsat 9 is very similar to Landsat 8.

kwetstone · January 26, 2023, 3:11pm

@astra92293 good question! Thank you for catching that in the benchmark.

Per the rules, participants can only use Landsat 8 and Landsat 9. This should cover the full dataset. We’ll make sure to update the benchmark so it is not confusing.

Have you found any cases where Landsat 8-9 are missing, but Landsat 7 is available?

astra92293 · January 27, 2023, 6:00pm

yes, for example: uid: “evep” and 204 samples more
I’have used this code to find it. My request may be wrong, in that case, please correct me.

bbox_of_interest = get_bounding_box(row["latitude"], row["longitude"])
time_of_interest = get_date_range(row["date"], time_buffer_days=30)

search = catalog.search(
        collections=["landsat-8-c2-l2", "landsat-9-c2-l2", "sentinel-2-l2a"],
        bbox=bbox_of_interest,
        datetime=time_of_interest,
        max_items=5
    )

items = search.item_collection()

astra92293 · January 27, 2023, 6:04pm

abcx, acmt, afjf, agxe, ahya, alxv, ascv, axqb, aydg, ayfy, beyu, bfgd, bgao, bgci, bgwz, bils, bjij, bmdf, bsku, buux, caqq, cfoq, cgdm, cimy, cizy, cnsa, cont, cqge, cxks, dezc, dhhf, djsq, dstc, dvpi, dzbx, eamn, egmo, egxa, eozn, epqz, eqdu, etfx, evep, fqbd, fryq, fsue, ftdj, fwbt, fwjw, gdxr, gfmu, glkd, gmap, gmzg, gtem, guny, hbyx, hhfq, hiqg, hkau, howw, hrnw, hrsu, hsul, htnz, hwip, hwqb, hzfs, igat, ilvp, imgl, imhs, imsv, innr, iorz, ipir, ipsj, irdp, ivqa, jalu, jmsk, johw, jpte, jrbg, jubi, jxtp, jzaw, kkse, knrx, kocd, kosj, kpoa, kptf, ksbu, kwfa, kwua, leit, lnth, lwjc, lwpx, mkig, mnkc, mrce, mrco, mucf, muds, mugo, mvox, mwou, myzt, napd, nbps, noeo, nxog, ocgc, ofzs, oqcg, owaj, paev, pahl, paps, pavo, pbfb, pceh, pcyq, pdqu, pfly, pfsh, pmqr, ppuv, prgf, pwpz, pyhs, qadg, qcnd, qgrf, qmeq, qrhs, rctp, rczc, rfuu, rgbz, rkwk, rqvt, rsos, rwbd, rwkd, ryao, rzbb, sbib, scbd, seke, sfek, sgtc, snsp, solv, souo, spid, spwi, ssjw, sxxb, sygs, tdfz, tfnk, tipv, tjhn, trsz, ucby, ujjc, unjm, uoib, urlk, usoo, utzk, uutm, uxdg, vcho, vfgn, vnkl, vojx, wavz, wexl, wglp, wgoz, wgxq, wmvf, wndy, wrhn, wrxx, wtcf, wtlv, wybi, wzaz, xdun, xgwa, xuuk, yhol, yjde, yrnv, yxsg, zfcj, zfwc, zhcn, zmuh, zmvh, zuob,

system123 · January 29, 2023, 7:58pm

I have the same problem, using L8, L9 and S2 collections there are 75 scenes which have no data in a 15 day window prior to the test observation. For the training dataset there was data available for all the dates.

For example: Scene “ahya” for the 30 day window prior to the observation date, there are only 2 Landsat 7 scenes available.

system123 · January 30, 2023, 7:56am

There are data points in the test set which were acquired prior to the launch of Landsat-8 (launch date 11 Feb 2013), Landsat-9 (launch date 27 Sept 2021) and Sentinel-2 (launch date 23 June 2015). Thus according to the competition rules there are no valid datasets for these observations. Also the launch date seldom equals the date that data is actually available, as there is usually a period of a few months for commissioning the satellite.

There are 81 scenes (some test and some train) which do not have any data available (I realized I previously downloaded Landsat-7, as per the Baseline method example notebook (How to Predict Harmful Algal Blooms Using LightGBM and Satellite Imagery - DrivenData Labs), but this is not allowed according to competition rules). Thus it also invalidates the Baseline results or anyone who has used the data download method specified there.

The troublesome scenes are:
[‘abcx’, ‘afjf’, ‘ahya’, ‘bgci’, ‘bgwz’, ‘bils’, ‘bmdf’, ‘cfoq’,
‘cizy’, ‘cnsa’, ‘cqge’, ‘dezc’, ‘djsq’, ‘dstc’, ‘dvpi’, ‘eamn’,
‘egmo’, ‘evep’, ‘fqbd’, ‘fryq’, ‘fsue’, ‘fwbt’, ‘gdxr’, ‘glkd’,
‘gmap’, ‘gtem’, ‘guny’, ‘hiqg’, ‘howw’, ‘hrsu’, ‘hwqb’, ‘hzfs’,
‘imsv’, ‘ipir’, ‘jalu’, ‘jubi’, ‘kocd’, ‘kosj’, ‘ksbu’, ‘kwua’,
‘lnth’, ‘lwjc’, ‘mrco’, ‘mugo’, ‘mwou’, ‘nbps’, ‘ocgc’, ‘oqcg’,
‘owaj’, ‘paev’, ‘pbfb’, ‘pceh’, ‘pdqu’, ‘pfly’, ‘pfsh’, ‘ppuv’,
‘prgf’, ‘pyhs’, ‘qadg’, ‘qgrf’, ‘rctp’, ‘rgbz’, ‘rsos’, ‘rwbd’,
‘rwkd’, ‘seke’, ‘sgtc’, ‘sygs’, ‘tipv’, ‘trsz’, ‘ujjc’, ‘uoib’,
‘urlk’, ‘vfgn’, ‘wexl’, ‘wgoz’, ‘wgxq’, ‘wrxx’, ‘wtlv’, ‘yjde’,
‘zmuh’]

kwetstone · February 2, 2023, 4:42pm

Thank you all for identifying this issue and letting us know! We are looking into it and will send an update shortly!

system123 · February 6, 2023, 10:50am

Are there any updates on this, and will there be an extension to the competition if the issue is confirmed?

kwetstone · February 6, 2023, 3:20pm

We just posted an announcement that Landsat 7 data is now allowed only for samples from January and February of 2013. For more recent samples, any Landsat data must be from either Landsat 8 or Landsat 9. Since this affects a very small percentage of competition data, there will not be an extension to the competition. For more details see the problem description page.

Thank you all for your hard work and for investigating the availability of satellite data for all samples! We appreciate your time and detailed analysis, and are excited to see what you submit in the competition

Topic		Replies	Views
Length of dataset mismatch Mapping Disaster Risk from Aerial Imagery	2	542	December 3, 2019
Has Satellite image water range? Tick Tick Bloom Challenge	2	468	January 2, 2023
Satellite data not available for some train/test samples NASA Airathon	1	509	February 17, 2022
No discussion for this competition? Mapping Disaster Risk from Aerial Imagery	7	651	December 16, 2019
Satellite Downloads Failing Tick Tick Bloom Challenge	5	278	February 13, 2023

How many test images are missing imagery?

Related topics