Understanding MAIAC geolocation

I am trying to wrap my head around the available data for this competition, and I have decided to do a bit of exploration. One thing that I’ve done is visualize the values in train_label.csv on a map - this helps me understand a bit more where PM25 is high/low visually.

I am now playing around with MAIAC files. I am not sure i completely understand their structure. I am unable to understand few things:

  1. What are the “layers” that are mentioned at the top of the metadata?
  2. When I browse the 1x 1 km data: where is this Kilometer tile? I’m unable to find out in the metadata of the file - All I could find where the Horizontal and Vertical Tile IDs (it took me some time to read those from the MODIS website).

I’ve tried following the instructions on the problem description page, but I was unable to infer the location of the satellite imagery.

Thanks for your guidance!
Regards, Eyas

@eyastaifour

  1. I’m not sure which layers you’re referring to. It could refer to the subdatasets - each file contains multiple datasets - or it might refer to the number of orbit overpasses, the third dimension of the dataset. Could you provide more information?
  2. There is an attribute called StructMetadata.0. Within it are fields called UpperLeftPointMtrs and LowerRightMtrs which give you the coordinates of the upper left and lower right corners in meters on the sinusoidal grid. Using these, you can interpolate between the coordinates to get the coordinate of each individual grid cell.

Here is a slightly outdated script that does the above and then converts it to a wgs84 projection: http://hdfeos.org/zoo/MORE/LPDAAC/MCD/MCD19A2.A2010010.h25v06.006.2018047103710.hdf.py
We’ll be releasing an updated one soon.

Here is a snippet using pyhdf to turn the metadata into a dictionary:

from pyhdf.SD import SD, SDC

maiac_fp = example.hdf #path to file here
hdf = SD(maiac_fp, SDC.READ)

# construct grid metadata from text blob
gridmeta = hdf.attributes()["StructMetadata.0"]
gridmeta = dict([x.split("=") for x in gridmeta.split() if "=" in x])

for key, val in gridmeta.items():
    try:
        gridmeta[key] = eval(val)
    except:
        pass
1 Like

wow, thanks @cszc . I had originally used gdal to access the HDF files, and then started exploring each individual function. gdal.Open.GetMetadata() did not return “StructMetadata.0” or any of its contents.
At the header of the file, it mentioned the 3 Additional Layers… Here’s also the output of gdalinfo:

Driver: HDF4/Hierarchical Data Format Release 4
Files: train/maiac/2018/20180201T060000_maiac_dl_0.hdf
Size is 512, 512
Coordinate System is `’
Metadata:
ADDITIONALLAYERS=3
ALGORITHMPACKAGEACCEPTANCEDATE=TBD
… (truncated) …

That link from the HDF EOS zoo you’ve shared is amazing - I am now understanding the file structure a bit more. Where can i learn more about the meaning behind the “channels” for each datafield?
In other words, this line:
data = data3D[0,:,:].astype(np.double)
What data lives in index0, 1, 2 (of axis 0)? what’s in data3D[1,:,:] -

Thanks!
Eyas

1 Like

@eyastaifour hmm, sorry. Not sure what that ADDITIONALLAYERS var is referring to. I can be more helpful with your second question.

TL;DR that first dimension or “channel” refers to the orbit overpass. More info about that in this thread: Understanding MAIAC data - #2 by cszc

1 Like

Brilliant, thank you @cszc - the information you’ve just shared is extremely helpful.
I have one more question if you don’t mind - just to validate my understanding: when i look at a 1x1 km dataset (which is represented as a 1200x1200 array), each pixel = 1x1km - is that correct?

Thanks!
Eyas

@eyastaifour yep! The array is 1200x1200km where each pixel is 1x1km, as stated in the Collection and Granule section on the product page.