學術產出-Periodical Articles

Article View/Open

Publication Export

Google ScholarTM

政大圖書館

Citation Infomation

題名 Synthesizing electronic health records using improved generative adversarial networks
作者 劉昭麟
Liu, Chao-Lin
Baowaly*, Mrinal Kanti
Lin, Chia-Ching
Chen, Kuan-Ta
貢獻者 資科系
關鍵詞 electronic health records (EHRs) ; synthetic data generation (SDG) ; generative adversarial networks (GANs) ; Wasserstein GAN with gradient penalty (WGAN-GP) ; boundary-seeking GAN (BGAN)
日期 2019-03
上傳時間 5-Mar-2020 14:41:17 (UTC+8)
摘要 Objective : The aim of this study was to generate synthetic electronic health records (EHRs). The generated EHR data will be more realistic than those generated using the existing medical Generative Adversarial Network (medGAN) method. Materials and Methods : We modified medGAN to obtain two synthetic data generation models—designated as medical Wasserstein GAN with gradient penalty (medWGAN) and medical boundary-seeking GAN (medBGAN)—and compared the results obtained using the three models. We used 2 databases: MIMIC-III and National Health Insurance Research Database (NHIRD), Taiwan. First, we trained the models and generated synthetic EHRs by using these three 3 models. We then analyzed and compared the models’ performance by using a few statistical methods (Kolmogorov–Smirnov test, dimension-wise probability for binary data, and dimension-wise average count for count data) and 2 machine learning tasks (association rule mining and prediction). Results : We conducted a comprehensive analysis and found our models were adequately efficient for generating synthetic EHR data. The proposed models outperformed medGAN in all cases, and among the 3 models, boundary-seeking GAN (medBGAN) performed the best. Discussion : To generate realistic synthetic EHR data, the proposed models will be effective in the medical industry and related research from the viewpoint of providing better services. Moreover, they will eliminate barriers including limited access to EHR data and thus accelerate research on medical informatics. Conclusion : The proposed models can adequately learn the data distribution of real EHRs and efficiently generate realistic synthetic EHRs. The results show the superiority of our models over the existing model.
關聯 Journal of the American Medical Informatics Association, Vol.26, No.3, pp.228–241
資料類型 article
DOI https://doi.org/10.1093/jamia/ocy142
dc.contributor 資科系
dc.creator (作者) 劉昭麟
dc.creator (作者) Liu, Chao-Lin
dc.creator (作者) Baowaly*, Mrinal Kanti
dc.creator (作者) Lin, Chia-Ching
dc.creator (作者) Chen, Kuan-Ta
dc.date (日期) 2019-03
dc.date.accessioned 5-Mar-2020 14:41:17 (UTC+8)-
dc.date.available 5-Mar-2020 14:41:17 (UTC+8)-
dc.date.issued (上傳時間) 5-Mar-2020 14:41:17 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/129121-
dc.description.abstract (摘要) Objective : The aim of this study was to generate synthetic electronic health records (EHRs). The generated EHR data will be more realistic than those generated using the existing medical Generative Adversarial Network (medGAN) method. Materials and Methods : We modified medGAN to obtain two synthetic data generation models—designated as medical Wasserstein GAN with gradient penalty (medWGAN) and medical boundary-seeking GAN (medBGAN)—and compared the results obtained using the three models. We used 2 databases: MIMIC-III and National Health Insurance Research Database (NHIRD), Taiwan. First, we trained the models and generated synthetic EHRs by using these three 3 models. We then analyzed and compared the models’ performance by using a few statistical methods (Kolmogorov–Smirnov test, dimension-wise probability for binary data, and dimension-wise average count for count data) and 2 machine learning tasks (association rule mining and prediction). Results : We conducted a comprehensive analysis and found our models were adequately efficient for generating synthetic EHR data. The proposed models outperformed medGAN in all cases, and among the 3 models, boundary-seeking GAN (medBGAN) performed the best. Discussion : To generate realistic synthetic EHR data, the proposed models will be effective in the medical industry and related research from the viewpoint of providing better services. Moreover, they will eliminate barriers including limited access to EHR data and thus accelerate research on medical informatics. Conclusion : The proposed models can adequately learn the data distribution of real EHRs and efficiently generate realistic synthetic EHRs. The results show the superiority of our models over the existing model.
dc.format.extent 2093445 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) Journal of the American Medical Informatics Association, Vol.26, No.3, pp.228–241
dc.subject (關鍵詞) electronic health records (EHRs) ; synthetic data generation (SDG) ; generative adversarial networks (GANs) ; Wasserstein GAN with gradient penalty (WGAN-GP) ; boundary-seeking GAN (BGAN)
dc.title (題名) Synthesizing electronic health records using improved generative adversarial networks
dc.type (資料類型) article
dc.identifier.doi (DOI) 10.1093/jamia/ocy142
dc.doi.uri (DOI) https://doi.org/10.1093/jamia/ocy142