03 May 2017

The Power of the Index

So last week I spent an excess of time thinking through a database modification. In our current system we have a semi-normal database that works but has a bunch of cruft in it. Some poorly named columns, a few incorrect data types and some significant domain abstraction leaks. So, we sat down and created a new domain model. The new model is all nice and clean and shiny. 

Two features we wanted to add to our database structure were versioning and source record tie-backs. That makes each insert pretty complicated. The following diagram is a simplified (and obscured) version of what I've been working with. 

Essentially we have source data tables with a few hundred thousand records and we need to insert them into our highly normalized system. In the real system there are actually multiple source tables representing different sources and types of person, but for this example I'm just including the bare essentials. 

So after playing around with various fancy tricks using CTE inserts I gave up and went to a simple in-line function. We're using Postgres for this experiment. The first test run used 378,755 records from two sources and had a total run time of 6 hours 32 minutes and 14.917 seconds. That is pretty pathetic. 

We put the queries into DataGrip and ran a few SQL Explains and after some tinkering we added six simple indexes to the system. We ended up with something like this;
The end result was amazing!

On the second run, using the same data set, we were able to get the total runtime down to 84.182 seconds. That is a world changing impact for so few indexes. 

ProTip: Before you abandon your experiment with the database, make sure you have appropriate indexes.