The Strangler Fig Pattern: Migrating Core Systems Without Breaking Production
Wrap the old, build the new, route traffic gradually, retire incrementally.
You need to replace your platform. It's slow, brittle, and blocking your team from shipping but you can't just turn it off and rebuild. Customers are using it right now, product teams are actively developing on it, and one wrong move takes down everything.
This is the core platform engineering problem, you can't stop the car to fix the flat tire.
The strangler fig pattern offers a way out. Instead of a risky big-bang migration, you gradually wrap the old system, route traffic to new components as they're ready, and retire the old pieces incrementally. I've used this pattern for two very different types of migrations, monolith-to-microservices and infrastructure modernization, and the principles apply to both.
The Core Pattern
The strangler fig gets its name from a plant that grows around a host tree. Over time, the fig takes over the tree's structure until the original tree can be removed entirely. In software, this means:
- Build an adapter layer that sits between clients and your existing system
- Implement new functionality alongside the old system
- Gradually route traffic from old to new
- Remove old components once they're no longer needed
The key is that you're never in a broken state. At every step, the system works. If something goes wrong, you roll back at the adapter layer, not in your entire architecture.
Scenario 1: Monolith to Microservices
This is the classic strangler fig use case. You have a monolith that's become a bottleneck. Deploy times are slow, teams can't work independently, and the codebase is intimidating to new engineers. What's worse is that adding to it just continues to make the problem worse.
How do you break out of the cycle? Wrap the old, build the new, route traffic gradually, retire incrementally
The Approach:
Build a gateway or API facade that sits in front of both your monolith and your new services. This becomes your traffic router and your safety valve.
Behind the gateway, you're going to have two systems, the old and the new. The key here is, for existing functionality, to replace the old with new while maintaining the existing interface. This will allow you to move the traffic from old to new without the client knowing anything changed.
- Pick a small, low risk slice of functionality - Don't start with critical, start with easy and prove the pattern out. We're playing the long game here.
- Build the new service with an identical interface - Don't "fix" the service while you migrate, one thing at a time. We'll fix things with the next version.
- Deploy the new service next to the old one behind the gateway - We're hiding these things and using our gateway as our API's single point of entry.
- Validate in production - Staging is great and all but no matter how hard we try, it'll never be production until it is. Canary deployments are your friend.
- Gradually increase traffic - If metrics are looking good, start bumping up the traffic. If things are going sideways, roll back at the gateway.
- Retired the old code - Once traffic is 100% moved over and service is stable, remove the old code entirely. You're never going to need to reference it, I promise.
After that, rinse and repeat. You'll be amazed at how much faster the next migration is and how quickly you get comfortable operating at the API gateway level.
Something to keep in mind is the evolution of the API contract and backward compatibility. Your new microservices probably have a better data model than your monolith. But if you change the API contract, you break clients. You'll need adapter logic, data synchronization, and eventually a coordinated cutover for clients to adopt new contracts. API versioning is a big help here as well as forced upgrade routes for things like mobile or desktop clients. You're going to carry some adapter logic and it'll be for longer than you'd like, but it's not the end of the world. Ship the smallest slice that proves the path works, then iterate.
Scenario 2: Infrastructure Modernization
Infrastructure migrations like moving from EC2 to Kubernetes, migrating databases, or switching cloud providers, follow the same pattern but with different mechanics.
You can't build a "gateway" between EC2 and Kubernetes the same way you can between a monolith and microservices. But the principles still apply, wrap the old, build the new, route traffic gradually, retire incrementally.
The Approach:
The adapter layer here is your deployment and routing infrastructure. You're running both systems in parallel and controlling which handles production traffic.
Here's how it maps:
- Containerize one service at a time - Don't try to migrate everything at once. Pick a single, low-risk service. Something with good test coverage, clear boundaries, and not in the critical path.
- Deploy to the new infrastructure in parallel - Your service is now running in both places - EC2 and Kubernetes. Traffic is still going to EC2.
- Validate the new deployment - Does it start up correctly? Does it connect to dependencies? Does it handle traffic in a non-production environment? This is your "proof it works" step.
- Route a small percentage of production traffic - Use your load balancer or DNS to send 5% of traffic to the new infrastructure. Monitor everything. If metrics degrade, route back to EC2.
- Increase traffic gradually - 5% → 25% → 50% → 100%. At each step, validate that things are working. Your rollback is simple: change the traffic routing.
- Decommission the old infrastructure - Once the service has been stable on new infrastructure for a meaningful period (days or weeks, not hours), shut down the EC2 instance.
Move to the next service. Repeat until you're done.
Cost modeling and operational complexity are going to become two points of significant complexity. Running both systems in parallel is expensive, you're paying for EC2 and Kubernetes at the same time. You need to prove the ROI through elasticity, resource efficiency, or developer velocity. Make the business case before you start, not halfway through This means cost estimations based on estimated resource consumption and performance testing. Run performance tests, calculate resoruce requirements based on historical traffic, and then calculate cost estimates.
The other challenge I've hit is integrating with new deployment pipelines and monitoring systems. Every service migration becomes a mini-project because you're not just moving the service, you're adopting a whole new operational model.
The Framework: Making This Work
Whether you're migrating monoliths or infrastructure, these principles apply:
Start with the smallest possible slice
No ocean boiling. The goal of your first migration isn't to ship the most important thing, it's to prove the pattern works with minimal risk. Pick something small, low-traffic, and non-critical. Prove out the approach, get comfortable with it, and build up that muscle memory. Everything is easier once you start.
Build your adapter layer correctly
Your gateway, load balancer, or routing layer is the most critical piece of infrastructure in your migration. It must not become a bottleneck or a single point of failure.
This means:
- High availability built in from day one
- Observability for every routing decision
- Fast rollback capability
- No business logic creeping in (it's a router, not a service)
Maintain backward compatibility ruthlessly
Changing interfaces while migrating implementations is where teams get stuck. You think "we'll just update the clients at the same time" and then you discover 47 internal clients you didn't know about, plus three external integrations that can't change quickly.
Preserve interfaces. Even when it's painful. Even when the old API design makes you cringe. You can evolve the contract later, after the migration is done. This is an infrastructure project, not product.
Validate in production-like environments
Staging will not catch everything. You need real traffic, real data, and real dependencies. Use canary deployments, feature flags, and percentage-based routing to test in production with minimal risk.
Your goal is to discover problems when they affect 5% of traffic, not 100%.
Measure and monitor religiously
You need to know when something's wrong before your customers tell you. Track:
- Error rates and latency for migrated traffic vs. baseline
- Business metrics (conversion rates, successful transactions, etc.)
- Infrastructure metrics (CPU, memory, network)
If you can't measure it, you can't safely migrate it.
Common Pitfalls
Trying to do too much at once
The pattern works because each step is small and reversible. When you try to migrate 10 services at once, or change the API contract during the migration, you lose that safety.
Ship the smallest slice that proves it works. Then iterate.
Underestimating backward compatibility work
Adapters, data synchronization, and contract preservation take time. More time than you think. Budget for it upfront.
Skipping the "prove it works" step
You need to validate each piece in isolation before moving to the next. It's tempting to rush, especially when leadership is asking "when will this be done?" But skipping validation means you're stacking unproven changes on top of each other.
When something breaks, and it will, you won't know which layer caused it.
Letting the adapter layer accumulate tech debt
Your gateway or routing layer is meant to be temporary infrastructure. Don't let it become a permanent dumping ground for business logic, workarounds, or feature flags you never clean up.
Set a deadline for retiring it. Treat it like scaffolding, not a foundation.
Running initiatives in parallel
I've made this mistake. Trying to run a major infrastructure migration at the same time as a major feature initiative stretches your team thin and increases risk. Both initiatives move slower, and when something breaks, you don't know which change caused it.
Sequence when you can. When you can't, be explicit with leadership about the cost.
When to Use This Pattern
The strangler fig pattern is right for you if:
- You have active production traffic you can't interrupt
- Your team needs to keep shipping features during the migration
- You can decompose the system incrementally
- You have time (this is not a fast approach)
It's NOT right if:
- You can afford downtime for a cutover
- The system is small enough for a clean rewrite in a reasonable timeframe
- You're starting from scratch (just build it right the first time)
- You need results in weeks, not months
The Bottom Line
Platform migrations fail when teams try to do everything at once. Big-bang cutovers are high-risk, high-stress, and politically expensive when they go wrong.
The strangler fig pattern succeeds because it breaks the problem down into small, reversible steps. At every point in the migration, your system works. If something goes wrong, you roll back one piece, not the entire architecture.
Whether you're breaking apart a monolith or modernizing infrastructure, the playbook is the same, wrap the old, build the new, route traffic gradually, and retire incrementally.
Start with the smallest slice that proves the pattern works. The rest will follow. What's the lowest-risk service or endpoint you could migrate this week?
← all writing