Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge
Ali, Sharib; Ghatwary, Noha; Jha, Debesh; Isik-Polat, Ece; Polat, Gorkem; Yang, Cheng; Li, Wuyang; Galdran, Adrian; Ballester, Miguel Angel Gonzalez; Thambawita, Vajira L B; Hicks, Steven; Poudel, Sahadev; Lee, Sang-Woong; Jin, Ziyi; Gan, Tianyuan; Yu, Chenghui; Yan, JiangPeng; Yeo, Doyeob; Lee, Hyunseok Lee; Tomar, Nikhil Kumar; Haitham, Mahmood; Ahmed, Amr; Riegler, Michael Alexander; Daul, Christian; Halvorsen, Pål; Rittscher, Jens; Salem, Osama E.; Lamarque, Dominique; Cannizzaro, Renato; Realdon, Stefano; de Lange, Thomas; East, James E
Peer reviewed, Journal article
Published version
Permanent lenke
https://hdl.handle.net/11250/3114190Utgivelsesdato
2024Metadata
Vis full innførselOriginalversjon
10.1038/s41598-024-52063-xSammendrag
Polyps are well‑known cancer precursors identified by colonoscopy. However, variability in their
size, appearance, and location makes the detection of polyps challenging. Moreover, colonoscopy
surveillance and removal of polyps are highly operator‑dependent procedures and occur in a highly
complex organ topology. There exists a high missed detection rate and incomplete removal of colonic
polyps. To assist in clinical procedures and reduce missed rates, automated methods for detecting and segmenting polyps using machine learning have been achieved in past years. However, the major
drawback in most of these methods is their ability to generalise to out‑of‑sample unseen datasets
from different centres, populations, modalities, and acquisition systems. To test this hypothesis
rigorously, we, together with expert gastroenterologists, curated a multi‑centre and multi‑population
dataset acquired from six different colonoscopy systems and challenged the computational expert
teams to develop robust automated detection and segmentation methods in a crowd‑sourcing
Endoscopic computer vision challenge. This work put forward rigorous generalisability tests and
assesses the usability of devised deep learning methods in dynamic and actual clinical colonoscopy
procedures. We analyse the results of four top performing teams for the detection task and five top
performing teams for the segmentation task. Our analyses demonstrate that the top‑ranking teams
concentrated mainly on accuracy over the real‑time performance required for clinical applicability.
We further dissect the devised methods and provide an experiment‑based hypothesis that reveals
the need for improved generalisability to tackle diversity present in multi‑centre datasets and routine
clinical procedures.