Exploring Diffusion Models and Self-Supervised ViT's in Generating and Indexing Synthetic Data to Enhance Polyp Segmentation
Abstract
Polyps in the gastrointestinal (GI) tract is a known precursor to colorectal cancer(CRC), making detection and removal essential for cancer prevention. Despite theeffectiveness of screening procedures like colonoscopies, high miss rates of up to28 % calls for computer aided detection systems (CAD) based on deep learning toincrease detection rates. Scarcity of labeled medical data as a result of strict privacylaws, however, serves as a major bottleneck in the development of detection andsegmentation models.This thesis explores the efficacy of using diffusion models and self-supervisedvision transformers to subsequently generate polyp images and complex embed-dings, in an effort to enhance polyp segmentation performance. By performing in-dexing based on embeddings for both real and synthetic data, we create syntheticallyinfused datasets for training segmentation models. Generated data is split into cat-egories based on similarity and variation within the embedded representations com-pared to real samples.We showcase the ability of conditional latent diffusion models to generatehighly realistic synthetic polyps and GI tract images. We further display the applic-ations of using self-supervised vision transformers to create rich representations ofimages capturing deep features and semantics within polyps and the GI tract, en-abling us to create enhanced datasets containing infused generated images based oncharacteristics of their embeddings. From experiments, segmentation models trainedon synthetically enriched datasets show increased performance in-domain, and sig-nificant increases out-of-domain, compared to using only real data. Additionally, weidentify that generated polyp images with embeddings displaying similar character-isitc to real samples improve segmentation performance the most.As a conclusion, the integration deep learning concepts such as generativemodels and self-supervised vision transformers have the capacity to significantlyimprove detection and segmentation performance of polyps in the GI tract.i