Wait, you’re VAEGAN too? A project on generating and evaluating synthetic health data.



Euan Gardner


Senior Information Development Manager - NHS National Services Scotland

Abstract: Data is the new gold but gaining access to it is a difficult safe to crack. Due to this, there has been a push towards open data, but this solution is not perfect from a privacy or analytical perspective.

An increasingly popular way forward is the concept of synthetic data. The idea is that the data will be identical in terms of statistical qualities and usability but none of the persons within it are real, lessening privacy risks while maximising data utility.

The NHS NSS and The Data Lab have partnered on a comprehensive synthetic data project that will be outlined in the talk, with key information for anyone interested in data.

The talk will begin by outlining how project focuses aims to assess the performance of current synthetic data generation approaches, especially on complex data. The development of a comprehensive evaluation toolkit for static and dynamic synthetic data quality will then be discussed.

The talk will conclude with the methods being developed by NHS NSS and The Data Lab around the use of Variational Auto Encoders (VAE), Generative Adversarial Networks (GANs) and VAEGAN hybrid systems for synthetic data generation. The novel method of encoding the medical data being developed will also be discussed as this has ramifications for training the deep learning models.

Bio: I initially studied Psychology where I fell in love with statistics. In my MSc I applied deep learning to EEG data for controlling a text to speech system. This led to teaching myself machine learning and programming.

I now work full time in a data science team in the NHS NSS specialising in machine learning and data driven solutions. My current focus is on synthetic data, modelling sequence data and providing insight into health related issues.