Why wildcard app segments explode health checks, what the 6,000‑check guidance really means, and a practical migration runbook you can apply today.
How ZPA Health Checks Actually Work
When Health Reporting is enabled, each App Connector proactively checks application reachability based on your app segment definition:
- FQDN: The connector resolves DNS and checks every IP returned.
- Static IP: The connector checks that exact IP.
- TCP ports: 3‑way handshake tests per port.
- UDP ports: ICMP probe first; TCP‑based check if ICMP fails.
For example, there is an application segment with the application app1.cordero.me that has the following settings:
-
TCP ports:
443
-
UDP ports:
5000
If app1.cordero.me resolves to 192.168.20.11
and 192.168.20.12
, the App Connectors for this application perform reachability checks to:
-
192.168.20.11:443
-
192.168.20.12:443
-
192.168.10.10:443
-
192.168.20.11:5000
-
192.168.20.12:5000
-
192.168.10.10:5000
Total: 6 checks — reasonable. Problems arise when wildcards and huge port lists multiply these checks.
Selection note: If at least one connector in a group reports “Up” for an app, ZPA can select any connector in that group to broker the session (with ZPA stickiness ~30–90 min depending on flow/policy).
The 6,000‑Check Rule
Connectors throttle at ~20 health checks per second. If a connector is responsible for around 6,000 checks, the cycle takes about 300 seconds (5 minutes) to complete. At 20,000 checks, a cycle can stretch beyond 15 minutes, which delays accurate health reporting and consumes connector resources unnecessarily.
Back‑of‑napkin math
Checks per connector per cycle ≈
(# of FQDNs × avg # of IPs returned by DNS + # of static IPs) × (# of TCP ports + # of UDP ports)
Why Wildcards Hurt Efficiency
- Every connector probes everything when a wildcard is assigned to all groups.
- DNS fan‑out multiplies checks (one name → many IPs).
- Unpredictable brokering: many connectors report “Up,” so any may be chosen.
- Connector capacity is finite (concurrency + health‑check budget).
- Troubleshooting gets murky (“which connector/DC is brokering right now?”).
Avoid Double Load Balancing (F5 GTM + ZPA)
Pointing ZPA at an F5 GTM VIP introduces two independent load‑balancing algorithms (GTM and ZPA). Their stickiness timers and failover logic don’t coordinate, causing inefficiency and odd edge cases. ZPA already provides global brokering and stickiness.
Best Practices to Fix It
- Start with Top Applications
Use ZPA Analytics → Top Applications by Bandwidth and prioritize the biggest hitters. - Create tight app segments
Use explicit FQDNs and only the ports the app truly needs (e.g., 443, 1521, 3389). Avoid global VIP names that hide LB. - Pin by location (principle of locality)
DC1 apps → DC1 connector group; DC2 apps → DC2 group. If DC2 is failover, keep it out of steady‑state assignments. - Keep a temporary safety net
A narrow wildcard tied to one small connector group for short‑term migration/troubleshooting. Retire it once stable. - Right‑size health reporting
Use shorter intervals (e.g., 60–120s) for critical apps; longer for low‑priority. Split GUI/API/RDP into separate segments if helpful. - Watch the two dials
Per‑connector concurrent sessions and health‑check queue/latency. Add connectors or reduce assignments if a group runs hot.
Migration Runbook (step‑by‑step)
- Inventory & prioritize: Export Top Apps by Bandwidth. For each: FQDNs, ports, home DC, LB/DR model.
- Map to DCs: If app is active in DC1 with DR in DC2, treat DC1 as primary for ZPA routing.
- Define specific segments: Replace broad wildcards with app‑specific FQDNs and ports.
- Assign to the right connector group: Only the home DC group in steady state.
- Order policies: Put specific segments above any legacy wildcard so they always match first.
- Plan DR cleanly: If the app also exists in DC2, create a second segment for DC2 but keep it disabled until failover.
- Cut over in waves: Start with top 5 apps, then next 10–20. Validate success rate, connector load, and latency.
- Observe & right‑size: Add connectors or reduce scope if a group approaches limits.
- Retire the wildcard: Disable the safety‑net segment once specific coverage is complete.
Before vs After (visual)
Before: Wildcard Everywhere
Wildcard + All Ports
Symptoms: high connector CPU, long health‑check cycles, unpredictable brokering.
After: Specific & Local
Only DC1 app checks
Benefits: fewer checks, stable stickiness, predictable routing, easy troubleshooting.
FAQ
Q: If my wildcard allows “all ports except 53,” will ZPA test all those ports?
Yes. The connector will attempt reachability on every port you include in the app segment. That’s why broad port ranges are costly. Put only the real ports in the segment definition.
Q: Can I keep F5 GTM in front of apps?
You can for non‑ZPA clients, but don’t point ZPA at GTM VIPs. ZPA already provides brokering and stickiness; adding GTM creates two competing LB layers.
Q: What’s a healthy target?
Keep each connector to ~6,000 health checks per cycle or fewer, and ensure connector concurrency stays comfortably below its limits. Scale out by adding connectors to the group if needed.
Appendix: App Inventory Template
Use this table to drive the migration. One row per app or major component.
App / Owner | FQDN(s) | Ports (TCP/UDP) | Home DC | ZPA Connector Group | Health Interval | DR / Failover Notes |
---|---|---|---|---|---|---|
ERP | erp.dc1.corp.local | TCP 443, TCP 1521 | DC1 | CG‑DC1‑Core | 60s | Enable DC2 segment during DR only |
RDP Farm | rdp.dc2.corp.local | TCP 3389 | DC2 | CG‑DC2‑Core | 120s | GUI only; no UDP 3389 required |