Some migrations are loud. This one had to be invisible - because the database in question holds player wallets and live transactions, and the only acceptable amount of drama was none.
The challenge
A regulated iGaming and betting operator was running a high-throughput platform across several regional tenants, with a compliance profile spanning GDPR, PCI-DSS and AML obligations. Player wallets, transactions and gameplay data lived in a hybrid PostgreSQL estate: a managed distributed cluster of one coordinator plus eight workers carrying roughly 150 schemas, alongside a self-managed Patroni cluster on bare metal. Tens of terabytes of data, sustained high transaction rates on the busiest wallet and gaming workloads.
Two problems compounded each other. The database tier had hit a scaling wall - on the managed cluster, IOPS were coupled to CPU and RAM, so buying disk throughput meant overpaying for compute. And there was no centrally governed cloud foundation: each tenant’s infrastructure had grown organically, with no shared landing zone, no org-wide audit trail, and no consistent network segmentation.
The hard part is not moving one schema. It is moving 150+ of them, live, across tenants, while real money keeps flowing - and proving afterwards that not a single row drifted. A big-bang cutover concentrates all of that risk into one switch you cannot rehearse.
Our approach
The reframe: do not migrate the data first. Migrate the foundation first, then let the data flow into something that is already governed, observable and rehearsed.
Build the landing zone before touching the data
We stood up a multi-account AWS Organizations landing zone with Terraform, split across focused repositories - organization and identity, networking, audit, log archive, shared infrastructure, and per-tenant client infrastructure. Separate accounts for production, security, shared services and dev/test, with Service Control Policies and single sign-on through Identity Center. A dedicated security account aggregated CloudTrail, GuardDuty, Config and Security Hub, and a write-once log-archive account became the immutable home for audit data. Migrating into a governed foundation is far cheaper than retrofitting governance onto a hot platform later.
Treat the network as the migration fabric
A Transit Gateway acted as the hub, wiring the new AWS environment to the legacy bare-metal site, the previous cloud, and each regional operator over Site-to-Site VPN, with Client VPN for engineers. That let data replicate during the migration without exposing anything to the public internet - which is the only posture a regulated operator can accept while wallets are in motion.
Stage the cutover so it can be rehearsed
We designed the database move as two phases. Phase one was a lift-and-shift of the self-managed Patroni cluster into Multi-AZ RDS for PostgreSQL using AWS DMS and native logical replication. Phase two was the harder one: a gradual, schema-by-schema migration of the ~150-schema managed cluster. Logical replication keeps source and target in sync continuously, so each tenant’s cutover becomes a small, scheduled, reversible event rather than one high-stakes switch. RDS Multi-AZ on gp3/io2 volumes decoupled IOPS from instance size - the exact ceiling the old estate kept hitting.
Under the hood
The mechanics that made a quiet cutover possible:
- Inventory and group the schemas. Catalogue all 150+ schemas, map them to tenants, and order them by risk and traffic so the loudest wallet workloads move with the most rehearsal behind them.
- Establish continuous replication. Stand up logical replication from each source into Multi-AZ RDS and let it converge while the source stays live, so the target is always seconds behind.
- Re-architect the heavy tables in flight. Apply
pg_partmanandpg_cronfor time-based partitioning on the largest tables, and push cold partitions to S3 and Glacier for cost-efficient long-term retention. - Rehearse, then cut over per tenant. Validate row counts and integrity on the replica, then promote one tenant at a time during a quiet window, with the source held warm as a fallback.
- Fix what surfaces. Connection management moved from PgBouncer to RDS Proxy; along the way we diagnosed and fixed a connection leak that appeared when PgBouncer sat behind a Network Load Balancer.
All of it deployed through CI - GitHub Actions authenticating to AWS via OIDC, with no long-lived access keys anywhere in the pipeline - applying Terraform across every account. Datadog (installed via Ansible) gave the team RDS and PostgreSQL observability, and Cloudflare Zero Trust handled identity-aware access for engineers.
The outcome
The estate moved onto modern, managed, fully-coded infrastructure - one tenant at a time, without the drama. Decoupling storage throughput from compute removed the structural ceiling the old platform kept hitting. The landing zone gives a regulated operator what its auditors expect by default: account isolation, centralized and immutable logging, and policy guardrails that apply org-wide rather than per team. The kind of migration that only looks quiet because the planning wasn’t.
Key takeaways
- Build the landing zone before you touch the data. Migrating into a governed foundation beats retrofitting governance onto a live platform.
- Logical replication beats a big-bang switch. For large multi-schema databases, continuous sync plus a schema-by-schema cutover makes every step rehearsable and reversible.
- Make the network the migration fabric. A Transit Gateway hub lets data replicate privately across clouds and sites with nothing exposed to the public internet.
- Decouple IOPS from instance size. On RDS, gp3/io2 lets you buy throughput without buying an oversized instance - often the real reason a Postgres estate feels too small.
- Use OIDC for CI to AWS. Static keys in a pipeline are an avoidable liability, especially under PCI-DSS.