PostgreSQL fast load
Published on 29/07/2019
2 min read
I recently needed to do a data load into PostgreSQL (pg) on Azure for a project I'm working on. So in the time honoured tradition of doing the easiest thing first, I just ran through the records and built an insert statement and executed it. Record by record.
This worked fine for a handful of records. 50 records would import in less than a tenth of a second, no point in premature optimisation.
As these things go though, I needed to test with a larger dataset, 10MB instead of 7KB.
Then the duration went to 40 seconds, too much to bear for testing purposes.
I had known for a while that I would need to move to a more efficient import method. For pg, that seemed to be the COPY command.
Since I was doing this on Azure with the pg SaaS offering, with no access to the host, using the CLI or such seemed difficult.
There is the Azure Data Factory service which seemed like it would fit the bill, with the downside of having to manage another component, however it doesn't support pg as a sink, understandable I guess, they don't want to easily enable export from Azure to a competitor. Fair enough.
I was thinking of ways to do this, including spinning up a container with the pg CLI tools or a function with the CLI tools packaged.
Then, luckily before I actually started down the garden path, I thought to look to see if Npgsql, the .NET pg driver, supported the copy command natively and sure enough, it does, thanks devs, legends.
So, a quick refactor, literally 12 lines of actual code, 40 odd lines of tests and we're back to all green on tests again.
The improvement is amazing, from 40 odd seconds to 0.6 odd seconds