Hi there, I really enjoyed the tutorial, it provided a very good starting point. I was wondering of anyone could enlighten me on some of the choices thatw ere made concerning adapting the resnet50 model for our dataset.
In particular, we choose to replace the final fc layer with the following layers:
model.fc = nn.Sequential(
nn.Linear(2048, 100), # dense layer takes a 2048-dim input and outputs 100-dim
nn.ReLU(inplace=True), # ReLU activation introduces non-linearity
nn.Dropout(0.1), # common technique to mitigate overfitting
nn.Linear(
100, 8
), # final dense layer outputs 8-dim corresponding to our target classes
)
Why choose 100 in the first layer? why not 1000 or 50?
How common is it to add a ReLU and Dropout layer when adapting something like a resnet?
What if we were to freeze the weights for transfer learning, would we still want to have those new layers at the end of the network?
What about if I wanted to use a different model, would it also require me to add those last layers for transfer learning?