Conditional Deep Generative Models for Generating Synthetic Electrocardiograms
Abstract
Using artificial intelligence (AI)-based diagnostic tools to assist healthcare professionals has increased significantly in recent years. However, AI is a heavily data-driven approach. Because of lack of data and especially imbalance nature of data, today’s AI faces the most common issue of bias and overfitting. On the other hand, it is an undeniable fact that the lack of data in healthcare is one of the major issues. With the newer privacy rules, collecting and sharing data has become even more challenging. To mitigate this issue, the use of synthetic data emerged as an alternative solution. In this study, we aim to develop tools to generate synthetic data, and we have carried out extensive research on state-of-the-art models. Using electrocardiograms (ECGs) as a case study, we have then developed conditional deep generative models using generative adversarial networks (GANs) and denoising diffusion probabilistic models to generate 10 second long 12-lead ECGs based on a given input condition like desired heart rate. We trained the proposed models on two different datasets (PTB-XL and INT99+GENSUS).We evaluated the ECGs signals generated by normal and conditional models using different methods on both datasets, which demonstrated that the signals generated by GANS models are more realistic. The data generated by the best conditional GANs model yields FID distances of 3.36 and 4.73 on PTB-XL and INT99+GNSUS dataset, while the diffusion model yields FID distances of 16.65 and 10.97 respectively (note:lower FID is better). Similarly, the error of the machine learning (ML) classifier shows that the model has higher error in distinguishing real samples and fake samples generated by conditional GAN (accuracy: 79%, 87%) compared to the conditional diffusion model (accuracy: 92%, 92%) in both datasets. Power spectrum analysis shows that ECGs signal generated by conditional GANs model is almost identical to real signal strength, while the diffusion model signal strengths are lower than real which demonstrates that different waves are not correctly generated by diffusion model. Most importantly, ECGs parameters extracted using AI-model confirmed that both GANs and diffusion models are generating signals closer to the given conditional information. The extracted ECG parameters from the ECG signals, which are generated by conditional Generative Adversarial Networks (GANs), more accurately match the specified input conditions given in generative process. We have also verified the synthetic ECGs with an expert cardiologist. The cardiologist validated that the ECGs generated by our proposed models closely resemble the predetermined conditions. Among the two methodologies evaluated, the Generative Adversarial Network (GAN) models proved to be the superior approach. The data generated by our models is openly accessible to researchers. Furthermore, we have published all of our models as Python packages, which can be used for generating synthetic ECGs. In conclusion, our proposed models have demonstrated their capability to generate realistic ECG signals in accordance with specified conditions. Thus, they can be employed to create training datasets for the development of AI-based diagnostic tools, without incurring any privacy-related issues, among other benefits.