HNPWA with Next.js

How not to change PostgreSQL column type

clarkbw | 98 points

My tests running ALTER varied from ~20 seconds to ~1 min for the changes.

> Current CI/CD practices often make it very easy for software developers to commit and roll out database migrations to a production environment, only to find themselves in the middle of a production incident minutes later. While a staging deployment might help, it's not guaranteed to share the same characteristics as production (either due to the level of load or monetary constraints).

(neon.tech employee here)

This is where branching databases with production data helps quite a bit. Your CI/CD environment and even staging can experience the schema changes. When you build from a seed database you can often miss this kind of issue because it lacks the characteristics of your production environment.

But the author rightly calls out how staging isn't even enough in the next paragraph:

>The problem is, therefore (and I will repeat myself), the scale of the amount of data being modified, overall congestion of the system, I/O capacity, and the target table's importance in the application design.

Your staging, even when branched from production, won't have the same load patterns as your production database. And that load and locks associated will result in a different rollout.

This has me thinking if you can match the production environment patterns in staging by setting staging up to mirror the query patterns of production. Mirroring like what's available from pg_cat could put your staging under similar pressure.

And then this also made me think about how we're not capturing the timing of these schema changes. Unless a developer looks and sees that their schema change took 56 seconds to complete in their CI system you won't know that this change might have larger knock on effects in production.

clarkbw | 12 days ago

Good post, you can tell the author has some burn marks from seeing some sparks in production.

> Let's start with The Bad News - you cannot avoid rewriting the entire table, which will generate a significant amount of WAL files in the process. This is a given, and you must plan how to manage it.

There are some exceptions to this. The blog post focuses on changing a column from int to bigint, which indeed needs a full table rewrite to reallocate 8 bytes per ID value instead of 4. But:

> As an exception, when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed. [0]

So, you could change from varchar to text, for example, or change from varchar(20) to varchar(4), etc. and not incur a full table rewrite.

0: https://www.postgresql.org/docs/current/sql-altertable.html

jihadjihad | 12 days ago

"Ensure the column is NULLable and does not have a default value to avoid forcing a full table rewrite."

That one should be fixed since Postgres 11 https://brandur.org/postgres-default

jbub | 12 days ago

Microsoft SQL Server has a similar issue but also similarly when adding a new column to a table, which I found out by accident when trying to deploy a migration that had worked fine in our non-production environment stages.

Adding the column as not nullable and setting a default is the absolute killer.

A work around when you still want to have a default is to do a three pass migration. First add the column as nullable and no default, then run an update to set the default on all rows, and finally change it to not nullable and set the default constraint.

It has a surprising difference in speed.

junto | 12 days ago

Changing a PostgreSQL column type without following the author's instructions and just running the following command is VERY anti-pattern. Confused why people do this in the first place.

ALTER TABLE table_name ALTER COLUMN column_name [SET DATA] TYPE new_data_type

>you need to make sure the source system has enough disk space to hold the WAL files for a long enough time

if the asynchronous replication process has an external buffer instead of the WAL, then it addresses this issue

j-cheong | 12 days ago

For ” How to Safely Change a PostgreSQL Column Type” (new column, trigger, etc). Is there a particular reason PostgreSQL cannot set up column migration itself out-of-the-box? I have used PSQL for many years and it is always a bit cumbersome to do the column migrations manually, even though the process itself is quite mechanical.

miohtama | 12 days ago

This is pretty much how pt-online-schema-change from Percona works.

Good old MySQL days...

Thaxll | 12 days ago

I’m probably alone here and this goes against the HN consensus but, as great a piece of tech Postgres is, I’ve often found it to be a premature optimization to pick as the main DB for newer companies. If you don’t know what your data models are, you’re going to end up in worlds of pain constantly running schema migrations like the OP describes.

vrosas | 12 days ago