30 Gigabyte Data Migration

30 Gigabyte Data Migration

Phill Luby
24th March 2020

Home Insights 30 Gigabyte Data Migration

Data migrations can be complex tasks requiring careful co-ordination, but when large volumes of data are involved the process becomes more complex.

Our client, Peritus Health Management Limited, had a large volume of clinical and non-clinical records stored on a private file sharing platform that they were decommissioning. Non-clinical records would go to Drop Box, but the clinical records needed to be cleaned, codified and imported into the clinical records system that we develop and manage for them.

Data migrations are complex, but especially complex when the source data is semi-structured and too big to deal with quickly. The first part of the problem involved the production of strict rules that codified the source data and reported exceptions. Careful analysis of the reports, followed by fixes to the source data, changes to the codification algorithm, and re-running the process, continued for several iterations. Our client was directly involved in analysing the reports and correcting data as it was their essential knowledge of their information that would allow correct codification. At the same time we were time-checking the import process to see if it needed to be incremental or could be done in one pass in our data centre. After weeks of validation and modification we were satisfied that the data could be loaded into the live platform.

A major failing of this type of project is to assume the process is correct, based on the checking of data cleansing and analysis reports. Human error is inevitable in a process of this scale and nature, so we expect some data to be incorrectly coded due to unstructured features going unidentified in the checking process. To defend against this problem, all records imported into the system are clearly marked with their source, and the original source data put into storage for later recovery should it be needed.

On the day the data was imported successfully and the clinical records system handled the information without issue. The source data sits safely where we can use it if needed.

Picture by Artur Rydzewski.

Share Article

Insights.

Taking a look at Dagger
Taking a look at Dagger

Solomon Hykes is probably most famous for being the founder and former CTO of Docker. Docker revolutionised the way we package, run and distribute server applications, so when Hykes starts a new venture, it's worth checking out.

Discover More
Running Terraform in a Lambda Function
Running Terraform in a Lambda Function

I recently set up a Terraform project which I wanted to run on a regular schedule. There are a number of ways to achieve this, but I decided to package the project as a Lambda function and schedule it with… 

Discover More
Setting up AWS SSO with Google Workspace
Setting up AWS SSO with Google Workspace

I recently configured Single Sign On (SSO) from our Google accounts to AWS. AWS SSO is the recommended way to configure SSO across multiple AWS accounts, yet Google is not a supported identity provider. However, this simply meant that there… 

Discover More