How long does it take to train the U-net baseline from the benchmark on 1 GPU?

benab.nasser · December 6, 2021, 5:26pm

Hi all,

I am wondering how long it does take to train the U-net baseline on a single GPU (e.g. time per epoch).

Thank you,
Best,
Nasser

motoki · December 9, 2021, 10:06am

It is quite fast. In colab pro, it takes only 15 minutes per epoch.

benab.nasser · December 9, 2021, 5:02pm

Nice, thank you for the info !!

ajijohn · December 31, 2021, 1:58am

Any tips, mine is taking long time per epoch

Kushal_Gandhi · January 16, 2022, 2:48pm

Mine is making almost 60 mins per epoch. Can you pls confirm yours?

ajijohn · January 16, 2022, 5:24pm

@Kushal_Gandhi can you tell me is it dedicated GPU or Google colab or Google Colab Pro ?

If not using the GPU, it could be long (usually long)

On the Google Colab Pro - with files in the session (means not loading from personal drive) takes - 5-7 mins. Mine was with benchmark code, resnet50 backbone and 10 as batch size . Also, look over a bit of discussion here - Submission Jobs failing because of CUDA out of memory, important take away is to have code change in place like @Max_Schaefer suggested.
On a dedicated GPU (1 with 24GB), I’m able to see again see below 5 mins with resnet50 backbone and 10 as batch size. Here lower the batch size, less memory it uses , good to keep in check if you are using a larger GPU box.

Hope this helps.

Kushal_Gandhi · January 18, 2022, 9:29am

Actually I am using keras. I made simple 5-6 layer model to check if my data fetching code is working or not. On that simple 5 layer model, it is taking an hour per epoch. I believe, its issue with my data fetching code. But do not know how to solve. below is my code. Pls help me if there is other way to fetch data in keras.

I am using Microsoft Planetary computer hub for training.
I have not downloaded data. Can you pls also confirm what is size of provided data set?

class image_process(keras.utils.Sequence):
def init(self, x,y,batch_size):
self.x = x
self.y = y
self.batch_size = batch_size

def __len__(self):
    return math.ceil(len(self.x)/self.batch_size)

def __getitem__(self,idx):
    batch_x = self.x[idx*self.batch_size:(idx+1)*self.batch_size]
    batch_y = self.y[idx*self.batch_size:(idx+1)*self.batch_size]
    ch_main_x = np.array([])
    ch_main_y = np.array([])
    
    for i in batch_x.index:
        #print(i)        
        ch_B02 = rasterio.open(batch_x.loc[i,"B02_path"]).read().reshape(512,512,1)
        ch_B03 = rasterio.open(batch_x.loc[i,"B03_path"]).read().reshape(512,512,1)
        ch_B04 = rasterio.open(batch_x.loc[i,"B04_path"]).read().reshape(512,512,1)
        ch_B08 = rasterio.open(batch_x.loc[i,"B08_path"]).read().reshape(512,512,1)
        ch_Label = rasterio.open(batch_y.loc[i,"label_path"]).read().reshape(512,512,1)
        
        ch_x = np.concatenate((ch_B02,ch_B03,ch_B04,ch_B08), axis = -1).reshape(1,512,512,3)
        
    
        ch_main_x = np.append(ch_main_x,ch_x).reshape(-1,512,512,3)
        ch_main_y = np.append(ch_main_y,ch_Label).reshape(-1,512,512,1)
  
    return ch_main_x, ch_main_y

ajijohn · February 1, 2022, 6:51am

@Kushal_Gandhi apologies for the delay, I suspect the hardware is shared so that might be the problem

Topic		Replies	Views
Slow training on Microsoft Planetary Hub with keras	0	257	January 16, 2022
Colab Pro job running slow On Cloud N	4	476	January 6, 2022
IMPORTANT: How to speed up Inference & Queue time On Cloud N	6	651	January 5, 2022
Tools for this Competition? Hateful Memes	3	970	May 25, 2020
Compute resources for the competition? Mapping Disaster Risk from Aerial Imagery	2	1007	November 30, 2019

How long does it take to train the U-net baseline from the benchmark on 1 GPU?

Related topics