Can i do this in tensorflow?

Is tensorflow allowed or not ? Do i have to do in pytorch ?

Also if tensorflow is ok then can any one point me some examples to start creating multi-modal DL models…

Yes you can! You can use any framework you want. I don’t know of any multi-modal libraries for tf though.

@douwekiela i am trying to get image data and text data generator but m failing to do so .

I have a generator yielding ((5, 224, 224, 3), (5, 70)) [img, text] and (5,2) label .
And when i fit using keras does the fit treats like this , ( X= [ (5, 224, 224, 3), (5, 70)], y=(5,2), epochs=…)

How to get the values of keras input layers ? Can you shed some light