Reproducible Analytical Pipelines for Health and Social Care Publications
Information Analyst - NHS National Services Scotland
Senior Information Analyst, NHS National Services Scotland
The Information Services Division (ISD) of the National Health Service Scotland produces approximately 200 health and social care publications each year. Most publications are produced using proprietary software such as SPSS, Business Objects and Microsoft Excel. Production of both data and reports is time and labour intensive, involving extensive manual formatting and checking as well as multiple movements of data between software. The Transforming Publishing Programme aims to modernise the process of producing publications by creating a Reproducible Analytical Pipeline (RAP).
RAP combines the concept of reproducible research from academia with data science best practices. It aims to improve the quality, auditability and speed of publication production, as well as ensure knowledge transfer in organisations with high turnover in staff. The team has focused on one publication as a proof of concept, developing an R package to produce the Quarterly Hospital Standardised Mortality Ratios (HSMR) publication. We used git and GitHub for version control and to facilitate collaborative working, such as peer reviews.
Results and Conclusions
The HSMR package includes each step of the publication production process: extraction from databases using SQL; data wrangling; generalised linear modelling and model validation; unit testing; and production of the final report using RMarkdown. This is the first official statistics publication in Scotland to have been produced using a Reproducible Analytical Pipeline. The new method of production has created substantial time savings and reduces the risk of manual error. We are now working with teams within ISD to broaden the scale of RAP within the organisation. We are currently developing a toolkit for ISD to support analysts to automate their reports, including an R style guide, GitHub best practice guidance and bespoke RMarkdown templates.
Bio: Jack is a development team member in the Transforming Publishing Programme at NHS Scotland. His role predominantly focuses on using R to automate the creation of statistical reports and modernise the way in which they are presented. He is a recent Statistics graduate and is based in Glasgow.
Bio: David graduated from the University of Glasgow with an M.Sci in Statistics in 2015. Since then, he has been working in the Information Services Division at NHS Scotland as an analyst, publishing data on hospital mortality and transforming how ISD releases its data (including re-developing the back-end with reproducible analytical pipelines) more generally.