I used a dataset from Pavel Pleskov, which was reduced to 512 on the wide side. He used PIL and ANTIALIAS interpolation.
What did not work in this competency:
- All other ways to read images (opencv, jpeg4py, dali), except for those used by Pavel.
I got my first result from using dali data loader and was very happy. But then I was struggling to achieve the same local score on LB. - Sampling.
I tried to take rare classes more often, emptiness less often. The score worsened. I also tried to speed up the whole train and throw 70% of the easiest samples, where the loss is already almost 0, also did not work. - Imagenet-style RandomCrop.
Which default in torchvision and dali. I broke my brain to understand how to choose the parameters in order to set the same scale as it would be for the test. As a result, I switched to Albumentations and everything turned out. - Small resolution.
For a very long time (like a couple of days) I experimented with the size of the input image in the region of 224 and could not break through the loss of 0.0040. Then I increased it and everything worked out.
What worked in and how my solution looks like:
- swsl_resnext50, wsl_resnext101d8. The first convolution and the first BN are frozen during all stages of training.
- pytorch-lightning, apex O1, distributed
- WarmUp, CosineDecay, initLR 0.005, SGD, WD 0.0001, 6-8 GPUs, Batch 256 per GPU
- loss / metric torch.nn.MultiLabelSoftMarginLoss
- Progressive increase in size during training. Wide side resize: 256 -> 320 -> 480
- During training, resize to ResizeCrop size on the wide side -> RandomCrop with ResizeCrop / 1.14 size. Moreover, the crop is not square, but rectangular with the proportion of the original image. During inference, resize to ResizeCrop and that’s it.
- From augmentations: flip, contrast, brightness. With default parameters from albumentations
- TTA: flip
- Averaging within one series - gmean
- TTA prediction and model averaging - gmean