Back to DrivenData | Blog

How to read hdf files from virtual file system

Hi, I’ve made an Azure account to access the remote sensing data as demonstrated in the example notebooks. I’ve followed the MODIS example notebook to the letter (within Azure ML), and am hitting an error when trying to open the hdf file with rioxarray open_rasterio - ’ filename.hdf not recognized as a supported file format.’ I’ve been pretty unsuccessful diagnosing this online, one thread mentions this is an issue when accessing hdf files using a virtual file system.

Are there any ideas as to how to fix this issue? It has been plaguing my evening…


My first guess would be that the environment doesn’t have GDAL (Geospatial Data Abstraction Library) installed. The MODIS example notebook worked for me after installing it.

1 Like

You’re right on, that is the issue. Unfortunately, installing GDAL has been a complete nightmare. It requires proj >= 6.0 as a dependency, which I am trying to build from source with no success. I’ve never really had to use ‘make’ before, so I’m a real noob, but is it very convoluted building things from source using Azure ML?

What are you actually using to run the notebook?


@smokey - my teammate was able to get rioxarray to work on Windows 10 by following the installation instructions for rasterio on the documentation. Only difference he made was he had to run the two separate pip install commands as one → pip install GDAL-2.4.1-cp38-cp38-win_amd64.whl rasterio-1.2.10-cp38-cp38-win_amd64.whl

I’m still working to get rasterio and GDAL to play nicely on Mac / Linux, which sounds like you’re in a similar boat… Nonetheless, I thought I’d give you some hope that it is possible, and that you’re not alone in the struggle lol. I’ll let you know if I get something figured out if you do the same for me :wink:

Ahh in the end I abandoned the whole virtual file system approach and just ran it on my local windows 10 machine, which circumvented all the aforementioned issues. I vaguelly recall doing something similar to your teammate, definitely involved downloading a certain version of GDAL, then rasterio… Did you ever figure your issues out?

@smokey sorry, I’ve been busy with work and haven’t had the time to look into this too much.

I was able to get everything working in a docker container :slight_smile: Here’s the Dockerfile for you and any other gents/lass’s trying to get everything installed and configured. This Dockerfile gets everything installed for MODIS and HRRR. Good luck and god speed.

FROM ubuntu:22.04
FROM python:3.8.5

FROM continuumio/miniconda3
RUN conda install -c conda-forge cartopy && \
    conda install -c conda-forge cfgrib && \
    conda install -c conda-forge gdal

RUN apt-get update && \
    apt-get -y install build-essential && \
    export GDAL_DATA=`gdal-config --datadir` && \
    GDAL_CONFIG=`which gdal-config` pip install rasterio --no-binary rasterio

# installing requirements
COPY ./requirements.txt /tmp/
RUN python3 -m pip install --disable-pip-version-check --no-cache -r /tmp/requirements.txt && \
    rm -f /tmp/requirements.txt```