Less is More - Adapting the YOLOv8 Network for Multi-Spectral Human Presence Detection
Abstract
Institutions involved with Search And Rescue (SAR) operations have found innovativeways of enhancing the efficiency of their efforts by utilizing drones equipped withthermal cameras to help better locate individuals in distress. One drawback of usingthese methods is the need for a human operator for navigating and visually inspectingthe information provided from the drones to detect and pinpoint the presence ofhumans. Based on the fact that current drone devices are capable of self-navigation,we believe that the efficiency of these operations would be significantly enhancedby automating the manual task of human presence detection. Several previousstudies have attempted to enhance object detection algorithms with the use of multi-spectral imagery but usually their approaches have introduced significant additionalcomplexity to their baseline models rendering them to run slower. Based on recentdevelopments in the YOLO model family, we believe that better detection resultscould be reached without hindering their computational efficiency. In this thesis,we have extended the YOLOv8 model to support multi-spectral imagery withoutintroducing radical changes to its architecture. We have done so by adopting anearly feature fusion strategy for obtaining our multi-spectral input, changing thekernel size of the first convolution block, and also upgrading the networks up-scalingblocks yielding better small object detection capabilities which is highly relevant forSAR operations. Additionally, we have converted three popular 4-channel datasetsto a format compatible with our proposed model. Several experiments have beenconducted to validate our results where the first one of them showed a more than22% increase in the mAP50-95 metric from the baseline model. Additionally, wehave compared our solution with a previous work and our model managed to reacha value of 0.656 in the mAP50-95 metric which is a more than 10% improvementfrom the previous work. Finally we also tested the real-time capabilities of ourproposed solution and discovered that the proposed changes had only minor effectson the inference speed. To encourage future work and give back to the community,we have made the code base (https://github.com/frnc96/ms-yolov8), dataset’s(https://huggingface.co/datasets/Frencis), and main models (https://huggingface.co/Frencis) publicly available for anyone.