Back to All Events

Forensic Data Hackathon

  • Leverhulme Research Centre for Forensic Science, University of Dundee, Ewing Building, Smalls Lane DD1 4EN Dundee (map)
forensic data hackathon site.png


Free Event

Venue: Leverhulme Research Centre for Forensic Science, University of Dundee, Ewing Building, Smalls Lane, DD1 4EN

Time and Date: 10:00-14:00 (approx.) 11-22nd March 2019. Final Event / Presentations 22nd March.


High quality Forensic Science, like many other areas of science, is underpinned by reliable data. We need it, for example, in order to be able to calculate the likelihood of someone's DNA profile matching a crime scene sample rather than the DNA profile of a random person. Most areas of the field are underlined by data and many require more data to cover as many potential scenarios as possible. Equally, we need to use our existing data to challenge different crime or event scenarios presented.

With this hackathon we are challenging data scientists, students, programmers or visualisation experts to help us come up with new, interesting and different, interpretations of two existing datasets that we have available within the Leverhulme Research Centre for Forensic Science (LRCFS). There will be pizza!

The two datasets are:

• Ballistic analysis of high velocity air pistol pellets

• Chemical analysis of ignitable liquids from multiple different sources

The aims of the LRCFS team is to positively disrupt the Forensic Science community to improve how forensic evidence is processed, analysed, interpreted and ultimately presented in court. We will do this whilst being as open and transparent as possible, making all our data, methods and research fully open by default. One aspect is to create what we call "Ground Truth" datasets where the data were collected under controlled conditions which can then be used to make reliable inferences about potential event scenarios. The ground truth data can then be compared to casework data and determine whether courtroom explanations are consistent with the data or not.

With this hackathon we are making two important datasets open for anyone to access and work with. Bringing people from different backgrounds together to analyse data in useful new ways for the benefit of the forensic science community and ideally have impact on the criminal justice system.

How? We will run an online hackathon over the two weeks of DataFest 2019 and on the last day, 22nd of March, we will host at LRCFS an event where all participants present their interpretations of the datasets to the rest of the teams and members of the public.

Numbers are strictly limited so please make sure to register in time. Everyone who intends to participate either in the two-week hackathon itself or on the presentation day needs to register.

Details of how to participate will be posted on the LRCFS website:

The Datasets

Air pistol pellets: A dataset of approximately 600 shots fired by the same air pistol one after the other. The pellets were then examined under an infinite focus microscope which accounted for the curvature of the image and photographed for 'lands' and 'grooves' indicative of the rifling present in the pistol's barrel. The dataset comprises the processed data from the images and the images themselves describing the size and shape of the lands and grooves detected on each pellet and the metadata associated with how the pistol was fired into what material.

The open questions are: can the land and groove information be used to differentiate the air pistol from an air rifle? Can compressed pellets with partial information be linked to the pellets from the pistol? As the firings were all sequential, can the data identify the order in which a random subset of pellets were fired?

Ignitable liquids: Three samples of four different liquid types (petrol, diesel, lighter fluid, MPD) from up to 14 different suppliers were obtained and their chemical compositions analysed with gas chromatography-mass spectrometry (GC-MS). The raw data are a series of peaks (corresponding to separate compounds) with their retention times and peak areas. The peak areas are proportional to the amount of each of the compounds in the liquids. They need to be standardised to an internal standard for comparison between runs. The retention times are approximately the same for a given compound in different runs, but not identical. The peak areas are proportional to the amount.

The neat liquids have also been submitted to a serial dilution experiment by 10%, 25%, 50%, 75%, 90% and 95% of the original.

The open questions are: can the individual liquid types and sources be separated when undiluted and then when diluted? For example, if given a 90% diluted unknown sample how reliably can its source and type be identified?


For either one or both of the datasets choose one or more of the following tasks:

1. Prepare the data such that it can be read in programmatically, whether that is via flat files like CSV or structured files like JSON is up to you.

2. Develop a model which can classify the sources given only the data

2.1. Present the data classes in a visually impactful or interesting way

3. Create a predictor which can make predictions of the source material (for the liquids) or the firing conditions (for the pellets) from sample input only and quantify their reliability/accuracy.

The Leverhulme Research Centre for Forensic Science is a £10 million research centre funded for 10 years by the Leverhulme Trust. The Centre has positive disruption at its very core, creating an interdisciplinary space where our researchers interact on a regular basis with people from across the criminal Justice Space. Our collaborators are global and are drawn from the senior Judiciary, law enforcement, forensic practitioners, crime scene investigators, innovators and entrepreneurs. We facilitate honest conversations articulating co shared challenges and delivering collaborative solutions to support a fair and just criminal justice system.