As the world’s longest established geological survey, the British Geological Survey has access to some of the most varied data of any organisation worldwide. However, more than a century worth of data comes at a price: how can we use datasets that were collected decades before even digital computing, let alone machine learning were invented?
Through applications such as our flagship iGeology app (https://www.bgs.ac.uk/iGeology) and mySoil (https://www.bgs.ac.uk/mySoil), we are able to bring existing data to the public in a useful yet interesting format. However, we are keen to extract value from existing datasets through modern analysis techniques; ideally by allowing any member of the public to interact with our data in a fun yet meaningful way.
One of our more interesting, yet underused datasets consists of several thousand, high quality, images of fossils. These images are labelled by fossil type and cover the entire BGS collection of physical specimens. While useful for our geologists, these images see little use outside of providing textbook images. The quality of these images, and the fact they are labelled makes them a good candidate for training up a ‘fossil classifier’. Such a classifier would be used to create a publically available app which could classify any fossil found by the user.
We are therefore running a Kaggle-style challenge in the form of a hackathon. We will provide the fully labelled dataset, and challenge teams to classify the images as accurately as possible. All teams will be required to present their final model, explain how it works and how the final design was reached. Furthermore, they will be required to run it on a test set to determine performance. While model accuracy is a major consideration, credit will also be given for innovation, scalability and ingenuity.
In order to build a working (and hopefully accurate) model in just one day, teams will have to balance the availability of compute resources, training time and member’s existing skillsets. Collaboration between teams will also be encouraged since certain skillsets will probably be in demand. It would be fun, for example, to see teams trading off GPU time for time from a field specialist. Although this will be a Kaggle-type challenge, collaboration will also be rewarded in the final rankings.
The hackathon will be held at the BGS’s office in the Lyell Centre, part of the Heriott-Watt University campus, spanning 14/03/2018 . Our offices can support approximately 50 attendees but we are able to accommodate more with Heriott-Watt facilities if demand entails. While we expect participation largely academia and public sector employees (being on a university campus), we welcome any teams with an interest in data science regardless of background. Prior experience is not required as tuition and guidance will be available from the BGS data science team. Cluster computing and/or a GPU will be provided for model training. We guarantee that all attendees will leave with their own image classifier, which they can tailor for other datasets in the future, regardless of their prior experience of programming or image classification.
By extracting new value from an old and unusual dataset, we hope to demonstrate the potential of modern analysis techniques in places where modern analysis may be neglected. It is not uncommon for organisations of all shapes and sizes to acquire and sit on ‘useless’ data. This hackathon is intended to inspire others to find value in what would otherwise be useless. Perhaps more importantly, our eventual goal of using the methods explored as part of the hackathon to build an app for the public will not only increase accessibility of our data and methods but will also expose cutting edge image recognition techniques to a wider audience.