Phase 2 unzip doesn't work

vvmar · October 1, 2020, 8:02pm

Phase 2 data convert doesn’t seem to be working
mmf_convert_hm --zip_file=XjiOc5ycDBRRNwbhRlgH.zip --password=XXXX --bypass_checksum 1
Data folder is /home/vvm/.cache/torch/mmf/data
Zip path is XjiOc5ycDBRRNwbhRlgH.zip
Copying XjiOc5ycDBRRNwbhRlgH.zip
Unzipping XjiOc5ycDBRRNwbhRlgH.zip
Extracting the zip can take time. Sit back and relax.
replace /home/vvm/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/data/README.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
Traceback (most recent call last):
…
mmf/mmf_cli/hm_convert.py", line 34, in assert_files
), f"{file} doesn’t exist in {folder}"
AssertionError: dev.jsonl doesn’t exist in /XXX/YYY/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images

Any help would be appreciated. Thank you!

caseyalan · October 1, 2020, 9:06pm

Are you using the updated password (visible on the data download page)?

vvmar · October 1, 2020, 9:21pm

yes. i am using the latest password

caseyalan · October 1, 2020, 9:33pm

So then you can unzip, just not using mmf?

It looks to me like mmf is expecting a file naming schema that isn’t there (we updated the file names for phase 2). For example, dev.jsonl doesn’t exist anymore. It’s dev_seen.jsonl and dev_unseen.jsonl. If this is the case, it probably requires a change to the mmf codebase (cc @douwekiela)

vvmar · October 1, 2020, 9:51pm

Is it possible to configure a custom data directory using environment variable different from $HOME/.cache/… location?

douwekiela · October 1, 2020, 11:54pm

Correct, the PR for phase 2 hasn’t been landed in mmf yet. Should be done tomorrow!

caseyalan · October 1, 2020, 11:56pm

Thanks for confirming, @douwekiela!

vvmar · October 2, 2020, 12:27am

Thanks for the follow up, guys. I thought I failed to read in between the lines:).
I will try to update and re-run tomorrow!

amanpreet · October 2, 2020, 1:06am

PR is up at: https://github.com/facebookresearch/mmf/pull/595 which you can patch until it lands.

caseyalan · October 2, 2020, 1:22am

Nice! Thanks @amanpreet

vvmar · October 2, 2020, 1:59am

Thanks! In the meantime, is there a way to point to a custom location for data directory other than ~/.cache/torch/mmf/data/datasets/hateful_memes when running the scripts? Is there an example for that?
du -sh ~/.cache/torch/mmf/data/datasets/hateful_memes
It’s pretty big
37G /home/xxx/.cache/torch/mmf/data/datasets/hateful_memes

amanpreet · October 2, 2020, 4:01pm

Yes, while extracting pass --mmf_data_folder=<your_dir> option to mmf_convert_hm command. Then, while running the commands, use MMF_DATA_DIR=<your_dir> environment variable to specify where your data dir is to MMF.

For MMF specific questions, please open up an issue on MMF for faster response in future.

vvmar · October 2, 2020, 6:14pm

Pulled updates form the master. Convert is working now. Thanks a ton!

vvmar · October 2, 2020, 7:55pm

There is one other issue.
Runner now always loads dataset
mmf.trainers.mmf_trainer: Loading datasets
where as before it was looking up from cache correctly. So it downloads and extracts 10GB file all the time now.

vvmar · October 2, 2020, 7:57pm

Sorry my bad. Wrong terminal. Once updated environment variable works as expected.

Topic		Replies	Views
How to change directory of project? Hateful Memes	0	407	August 15, 2020
Running baselines Hateful Memes	2	787	July 7, 2020
Unable to download dataset Hateful Memes	3	625	November 10, 2020
Competition Data not accessible Hateful Memes	5	753	October 27, 2020
Working with models and data Hateful Memes	1	654	May 25, 2020

Phase 2 unzip doesn't work

Related topics