Below is a concise explanation of how to manually fail over a Wide IP (for DNS load balancing) to another datacenter, along with some best practices. This assumes you have F5 BIG-IP DNS (formerly GTM) managing a Wide IP that references Virtual Servers (VIPs) in different DCs.
Typical GSLB (BIG-IP DNS) Setup
- Wide IP: Represents the fully qualified domain name (FQDN) you want to load balance globally (e.g.,
www.example.com
). - Pools / Pool Members: Each member is usually a Virtual Server (VIP) from one of your data centers.
- Monitors: BIG-IP DNS (GTM) uses health monitors to determine if a Virtual Server is “up” or “down” in each DC.
When BIG-IP DNS considers a Virtual Server “up,” it returns that server’s IP address in DNS responses. If it considers it “down,” it stops returning it, effectively failing over to another available datacenter.
Manual Failover Options
To force traffic from one DC (say DC1) to another (DC2), you can do any of the following:
1. Disable or Force Offline the Virtual Server in BIG-IP DNS
- Where: Go to DNS (GTM) → GSLB → Pools (or Wide IPs → Pools), find the Virtual Server (pool member) that points to DC1, and disable it.
- What Happens:
- BIG-IP DNS marks that Virtual Server as “down” or “disabled” for global load balancing.
- New DNS queries are directed to the remaining active datacenter (DC2).
- Graceful vs. Immediate:
- Disable = drain or gracefully stop sending new connections. Existing DNS responses might still point to DC1 until their DNS TTL expires, but BIG-IP DNS will not hand out that IP in new queries.
- Force Offline = marks it offline immediately. Also useful in an emergency if you need to cut off DC1 quickly.
2. Disable / Force Offline the Virtual Server in LTM (Local Traffic Manager)
- Where: On the BIG-IP LTM in DC1, locate the Virtual Server and Disable or Force Offline it.
- What Happens:
- The LTM in DC1 stops accepting new connections.
- The GTM monitor sees the local Virtual Server is unavailable. BIG-IP DNS automatically fails traffic to DC2.
- Pros/Cons:
- Pros: Straightforward if you already manage local traffic states frequently.
- Cons: If you only need to shift the DNS view of the service (and the LTM is otherwise still functional for internal use), forcibly disabling the Virtual Server might be overkill. It depends on your needs.
3. Remove the Virtual Server from the Wide IP Pool (Temporary or Permanent)
- Where: In DNS → GSLB → Wide IPs → Pools, select the member (DC1 Virtual Server) and Remove it.
- What Happens:
- That Virtual Server is no longer in the global load balancing pool. BIG-IP DNS returns only the remaining pool member (DC2).
- Use Case:
- Permanent removal if DC1 is decommissioned or no longer relevant.
- If it’s just for a brief failover, “Disable” or “Force Offline” is often more appropriate than completely removing it.
Which Method is “Best”?
-
Planned Maintenance or Gradual Failover:
- Disable the DC1 Virtual Server within BIG-IP DNS or LTM. This gracefully stops new DNS responses (on GTM side) or new LTM connections, letting existing sessions terminate naturally.
- Once traffic is drained, you can perform maintenance in DC1.
-
Emergency or Immediate Cutover:
- Force Offline the DC1 Virtual Server (either in GTM or LTM). This prevents any further traffic from landing on DC1.
- DNS queries immediately shift to DC2, although clients already connected to DC1 might break unless you have an application-level mechanism to redirect them.
-
Long-Term Removal:
- Remove the Virtual Server from the Wide IP if you don’t intend to use DC1 anymore. This is a permanent change.
Practical Tips and Considerations
- DNS TTL: Remember that DNS responses have a Time to Live. Clients may continue using the old IP until the TTL expires. Disabling or forcing offline in BIG-IP DNS stops new DNS queries from getting DC1, but clients with cached DNS might still try connecting to DC1 for the duration of their TTL.
- Monitoring: Always confirm that the Virtual Server has truly drained. You can check active connections in LTM or monitor logs to ensure traffic is no longer flowing to DC1.
- Data Synchronization: If your application is stateful, ensure data is replicated or available in DC2. Shifting traffic at the DNS layer doesn’t automatically replicate session or database state.
- Fallback/Failback: When DC1 is ready again, simply enable the Virtual Server in BIG-IP DNS or LTM. This returns DC1 to the global pool, allowing traffic distribution to resume per your load balancing methods.
Example: Graceful Failover to DC2
- Disable DC1 Virtual Server at the BIG-IP DNS Level:
- Navigate: DNS → GSLB → Wide IPs → Pools
- Locate DC1 Virtual Server and choose “Disable.”
- New DNS queries will favor DC2.
- Wait for Draining:
- Monitor active sessions or logs to see traffic levels on DC1 fall off.
- Remember DNS TTL means some clients may continue using DC1’s IP for a while.
- Force Offline If Needed:
- If you must ensure absolutely no new connections (even from cached DNS), Force Offline the Virtual Server.
- Perform Maintenance in DC1:
- Patch, reboot, upgrade, etc.
- Re-enable DC1 (Optional, if you’re returning service there):
- Turn the Virtual Server back to “Enabled” to resume normal load balancing.
Conclusion
Manually failing over a Wide IP to another DC typically involves disabling or forcing offline the Virtual Server in either BIG-IP DNS (GTM) or LTM. The choice depends on how quickly you need to shift traffic, whether you want existing sessions to drain, and whether you’re doing a partial or full datacenter failover. Always keep DNS TTL considerations in mind, monitor connection drain, and ensure stateful data is correctly handled if you want a truly seamless user experience.