How to fake a medical record in order to mitigate privacy risks
In machine learning, generative adversarial networks (GANs) involve two artificial neural networks squaring off, one, the generator, trying to delude the other, the discriminator, into accepting synthetic data as real. Beyond their science and engineering applications, GANs can generate utterly convincing “photographs” of people who do not exist.
Unrestricted use on a wide scale of electronic health records (EHRs) for biomedical or health services research is precluded by patient privacy considerations. Simulated EHRs could help speed discovery.
In a study in the Journal of the American Medical Informatics Association, Chao Yan, Ziqi Zhang, Bradley Malin, and colleagues use GANs to generate “electronic health records” of patients who do not exist.
Using some 1 million de-identified EHRs as a training set, the team refined the training, design and statistical evaluation of GANs for EHR simulation. Evaluated against earlier learning models, their medical GAN more closely mimics real-world data while providing training-set patients a similar level of protection from prospective privacy attacks.
Yan and Zhang are Vanderbilt University computer science doctoral students working in the Health Data Science Center, founded and co-directed by Malin, professor of Biomedical Informatics, Biostatistics and Computer Science. The three were joined in the study by Diego Mesa, post-doctoral fellow in the department of Biomedical Informatics, and, from the Georgia Institute of Technology, Jimeng Sun.
The study was supported by the National Institutes of Health (HG006844, OD023196) and the National Science Foundation.
by Paul Govern, VUMC Reporter