Training Course - Workshop
Charged Event - Fee £30
DataFest 2019 brings together local and international talent, industry, academia and enthusiasts who all share at least one interest — data! With a desire across sectors to succeed at Data Driven Innovation, how can we be sure that our data — our raw material — is as good as it should be?
This training brings the ideas and benefits of test driven development to the arena of data analysis. Using the open source Python TDDA library(test-driven data analysis), we'll work with data in CSV files, Pandas DataFrames, and relational databases.
Part 1: Testing Data Processes and Pipelines
Introduction to reference tests and how these can be written for various kinds of analytical processes over different data types. Topics will include
Motivation for and introduction to testing
Special considerations for testing analytical software and processes
Testing and regenerating complex and partially variable outputs, and supporting diff tools.
Part 2: Using AI to Generate Constraints from Data and their use for Detecting Bad Data
Using constraints to verify data, including:
identification of unexpected changes, outliers, duplicates, missing and disallowed values
advanced string verification, including automatic generation of regular expressions to characterise patterns in text data using rexpy.
Crucially, we will show not only how constraints can be used to detect change and problems in data, but also how those constraints can be automatically generated using AI methods in the tdda library.
The methods and tools are applicable to structured data and data pipelines using any software, not just Python.
It is essential that attendees bring a laptop (Mac, Linux or Windows) with a working python environment installed with Pandas, NumPy, as well as the TDDA library (tdda; available with pip from PyPI, and in source form on Github).
Detailed instructions on system configuration will be supplied to registered attendees before the session, as well as instructions on how to test the installation.
Help will be available at the venue in the 30 mins prior to the start of the workshop (from 13:30) for anyone unable to configure their environment.
What are my transport/parking options for getting to and from the event?
The venue is centrally located within Edinburgh's New Town.
It's a short walk from Waverley Train Station, and just a few minutes walk from the tram line and many bus routes. From Edinburgh Apirport the venue is about 30 minutes by taxi or 45 mins by Tram.
The nearest long stay car park is the Q-Park Omni and is around a 10 minute walk to the venue.
There is a wheelchair accessible drop off point at the rear of the building, with a lift to the main building, and a lift internally to the meeting room, please let the organiser know if you require access to this drop off point and they can help with the details.
Tea & Coffee and water will be available.
You are responsible for providing your own laptop with a working Python environment as described in the pre-requisites. There will be no spare machines available on the day. You are more than welcome to work in pairs, however no discount on the ticket price will be given.
Please be aware that photography and filming will take place during the event. The photos and video will be used to promote future events on the Stochastic Solutions, DataFest, and/or Data Lab websites and social media channels and on digital and/or print promotional materials.
How can I contact the organiser with any questions?
Mail us at training@StochasticSolutions.com.