2nd place solution

#20
by levilain35 - opened

Hi everyone,

Thanks to Data-Driven Science and huggingface for hosting this interesting competition.

image.png

The github of the solution can be found here : https://github.com/BenjaminDug/shipdetectionchal

My approach consist into 4 importants steps:

  • Preparing input data for models
  • Training 2 yolov8 models at differents images resolution: 640 and 1280
  • Post-processing at inference
  • Ensembling 2 models: my yolov8 models and an open source yolov5 models trained on dota.

1 - Pipeline

We have pictures of differents sizes. If we resize all images at the same size we lose a lot of information from very small ship. So we have to make a sliding window in order to cut this big pictures into smaller tile. I used a recover aera of 50% between tile. It is a parameter that I didn't change. Maybe I missed some small ship with this value of parameter. I notice that I have a lot of image with background only and some images with ships. It is imbalanced.

2 - Training yolov8

I only have a RTX 2070 at home, so I only could only trained a yolov8s on tiles with a resolution of 640x640 and a batch size of 8. It is the very limit because of the batch norm used in the yolo architecture...

I used the default parameter for training yolo. I only add an albumentation augmentation in the augment.py file of ultralytics. I added a RandomFog(p=0.2). It is a good augmentation for aerial images.

3 - Postprocessing

I first used SAHI (https://github.com/obss/sahi) for prediction. It made for me a sliding windows for making prediction on big images.

With this architecture I had 0.836 LB.

I decided to use my own sliding windows in order to add TTA for my yolos predictions. I did a recover of 80% between windows and I keept all bounding box after prediction of each tile at inference. Then, I did a NMS and without retraining I had 0.8749 LB.

I continue to observe my errors and with a resolution of 640x640, there are a lot of ship which are bigger than my tiles and I am not able to catch them with my yolo.

So I decided to train a bigger yolov8 on bigger tile: yolov8l on resolution tile of 1280x1280. I don't have hardware for that, so I used T4 on kaggle but I could only with a batch size of 4... It is not enough at all and I could only train for 12 hours. It is not simple to train for 12 hours, to save model, reload model and go to 12 hours again but it is free :). So I was not able to make a good training in this condition. So after 24h of bad quality training I decided to ensemble this model with the other and I obtained 0.885 LB (I use this solution as a second candidate submission for the private)
Notebook can be found here for training a yolov8l on 1280x1280 tiles: https://www.kaggle.com/code/benjamin35/ship-detection-chal

pipeline_training.png

3 - Ensembling 2 models

I found an open source yolov5 model trained on dota https://github.com/hukaixuan19970627/yolov5_obb. According to the git, the model was trained last year.

It works on tile with a resolution of 1024x1024. It makes rotated bounding box prediction, so I have to modify the code to create rectangle bounding box for our problem here. I made a prediction of this model with my sliding windows of 1024x1024 and I have 0.9269 LB.

I decided to ensemble only my yolov8s trained on 640x640 and this model and I obtained 0.9460 LB. This is my first candidate submission for private. The private board shows that I have a little bit overfitted.
I decided to withdraw 1280x1280 because the training process was very bad, I didn't really trust this model.

postproc_ensemble_vf.png

Congratulations on winning the second place ! Thanks a lot for sharing your solution

Sign up or log in to comment