What is the problem
If you’re using a schema management tool like Django ORM or Alembic, you need to run the migration at some point. And it’s tempting to run the migration as part of application startup—when you’re using Docker, for instance, you’ll have an entrypoint that will first run the migration and then start the application.
But coupling migrations with server startup can lead to quite a few problems, from database corruption to downtime, especially as you start to scale and get more users.
If you always do schema upgrades as part of the application startup you also end up mentally coupling schema migrations and code upgrades. In particular, you’ll starting assuming that you only ever have new code running with the latest schema.
Why is that assumption a problem? From most to least common:
- Sometimes you need to rollback a broken code upgrade. If you assume you always have new code with a new schema, you can end up in a situation where your new code is broken, but you can’t easily rollback to older code because you’ve done an irreversible schema change.
- To minimize downtime on upgrades, you want to have a brief moment where both old and new versions of your application are running in parallel. If your schema migration breaks old code, you can’t do that.
- To catch bugs in new code, you might want to do a canary deploy. That is, upgrade only one or two of your many processes and see if they break.
What can you do
You want schema migration to happen separately than code upgrades:
- So you can run a single migration, without worries about parallelism.
- Optionally, to reduce downtime risks, so you can run old code with the new schema.
The first requirement can be addressed by using a migration tool that uses locking to ensure parallelism doesn’t happen, e.g. Flyway.
The second requirement requires more work, and so you might choose not to do it for applications with less onerous uptime requirements. Essentially, you want to structure your schema migrations to be purely additive, at least in the short term.
For example, instead of changing a column type, you add a new column, and use triggers to make sure the two variants stay in sync. Then once you’re sure you’ll never have to go back to the old code, you run a further schema migration (again, without having to upgrade the code) to clear out the no longer needed column.
The general process is:
- Migrate from schema S to schema S+1, with only additive changes.
- Over time upgrade some of your processes from application version V to V+1.
- Eventually everything is on V+1, and you don’t ever expect to rollback to V.
- Finally, migrate from schema S+1 to S+2, and now you can do destructive schema changes to anything that V+1 no longer uses.
And of course you still need to do some work to deal with database locks and the like. Here’s an overview for PostgreSQL, as one starting point.