Phase 2 unzip doesn't work

Phase 2 data convert doesn’t seem to be working
mmf_convert_hm --zip_file=XjiOc5ycDBRRNwbhRlgH.zip --password=XXXX --bypass_checksum 1
Data folder is /home/vvm/.cache/torch/mmf/data
Zip path is XjiOc5ycDBRRNwbhRlgH.zip
Copying XjiOc5ycDBRRNwbhRlgH.zip
Unzipping XjiOc5ycDBRRNwbhRlgH.zip
Extracting the zip can take time. Sit back and relax.
replace /home/vvm/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/data/README.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
Traceback (most recent call last):

mmf/mmf_cli/hm_convert.py", line 34, in assert_files
), f"{file} doesn’t exist in {folder}"
AssertionError: dev.jsonl doesn’t exist in /XXX/YYY/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images

Any help would be appreciated. Thank you!

Are you using the updated password (visible on the data download page)?

yes. i am using the latest password

So then you can unzip, just not using mmf?

It looks to me like mmf is expecting a file naming schema that isn’t there (we updated the file names for phase 2). For example, dev.jsonl doesn’t exist anymore. It’s dev_seen.jsonl and dev_unseen.jsonl. If this is the case, it probably requires a change to the mmf codebase (cc @douwekiela)

Is it possible to configure a custom data directory using environment variable different from $HOME/.cache/… location?

Correct, the PR for phase 2 hasn’t been landed in mmf yet. Should be done tomorrow!

Thanks for confirming, @douwekiela!

Thanks for the follow up, guys. I thought I failed to read in between the lines:).
I will try to update and re-run tomorrow!

PR is up at: https://github.com/facebookresearch/mmf/pull/595 which you can patch until it lands.

1 Like

Nice! Thanks @amanpreet

Thanks! In the meantime, is there a way to point to a custom location for data directory other than ~/.cache/torch/mmf/data/datasets/hateful_memes when running the scripts? Is there an example for that?
du -sh ~/.cache/torch/mmf/data/datasets/hateful_memes
It’s pretty big
37G /home/xxx/.cache/torch/mmf/data/datasets/hateful_memes

Yes, while extracting pass --mmf_data_folder=<your_dir> option to mmf_convert_hm command. Then, while running the commands, use MMF_DATA_DIR=<your_dir> environment variable to specify where your data dir is to MMF.

For MMF specific questions, please open up an issue on MMF for faster response in future.

1 Like

Pulled updates form the master. Convert is working now. Thanks a ton!

There is one other issue.
Runner now always loads dataset
mmf.trainers.mmf_trainer: Loading datasets
where as before it was looking up from cache correctly. So it downloads and extracts 10GB file all the time now.

Sorry my bad. Wrong terminal. Once updated environment variable works as expected.