When deploying Zscaler Private Access (ZPA), it’s tempting to take shortcuts during early rollouts: one giant wildcard app segment (*.corp.local
), all TCP/UDP ports open, assigned to every App Connector group. It works… but it’s a silent killer for performance, scalability, and troubleshooting.
Here’s why that approach causes headaches and how to fix it.
How ZPA Health Checks Actually Work
When Health Reporting is enabled, each App Connector proactively checks application reachability based on your app segment definition:
-
FQDN → connector resolves DNS and probes every IP returned.
-
Static IP → connector probes just that IP.
-
TCP ports → TCP 3-way handshake test per port.
-
UDP ports → ICMP probe first; TCP fallback if ICMP fails.
Example
Suppose you define an app segment for app1.cordero.me with the following settings:
-
TCP ports: 443
-
UDP ports: 5000
If app1.cordero.me resolves to 192.168.20.11
and 192.168.20.12
, then each App Connector will check:
-
192.168.20.11:443
-
192.168.20.12:443
-
192.168.20.11:5000
-
192.168.20.12:5000
That’s 4 checks per cycle — very manageable.
Problems arise when wildcards or broad port ranges multiply this check count into the thousands.
Health Reporting Modes (and the “30 minutes”)
-
Continuous → Connector always probes the defined ports. (Not allowed for wildcards or >10 ports.)
-
On Access → Connector starts probing when a user connects, then continues for up to 30 minutes after the last session. After that, the app’s health shows as Unknown.
-
None → No probes at all. ZPA assumes the app is always reachable.
In all cases, the list of ports you configure drives which probes run. If you define “all ports except 53,” the connector will attempt them all, even if the user only ever uses 443.
The 6,000-Check Rule
App Connectors throttle at ~20 health checks per second.
-
With ~6,000 checks, a cycle takes about 300 seconds (5 minutes).
-
With ~20,000 checks, a cycle can stretch 15 minutes or longer, delaying accurate health reporting and wasting connector capacity.
Quick math:
A segment defined as “all ports except 53” effectively tells the connector to probe 65,534 ports per IP — instantly blowing past the 6,000-check guidance.
Why Wildcards Hurt Efficiency
-
Every connector probes every port for every FQDN when wildcards are assigned to all groups.
-
DNS fan-out multiplies the number of checks (one hostname → many IPs).
-
ZPA can pick any connector in the group that reports “Up,” leading to unpredictable brokering.
-
Connector capacity is finite (both concurrent sessions and health-check budget).
-
Troubleshooting becomes murky (“which connector/DC is brokering this session?”).
Avoid Double Load Balancing (F5 GTM + ZPA)
Pointing ZPA at an F5 GTM VIP introduces two independent load-balancing algorithms (GTM and ZPA). Their stickiness timers and failover logic don’t coordinate, creating inefficiency and odd failures.
Best practice: Point ZPA app segments directly to the DC-local FQDN/IP and pin them to the connector group in that DC. Keep GTM for non-ZPA clients if needed, but bypass it for ZPA.
Best Practices to Fix It
-
Start with Top Applications
Use ZPA Analytics → Top Applications by Bandwidth to prioritize the biggest hitters. -
Create tight app segments
Use explicit FQDNs and only the ports the app actually needs (e.g., 443, 1521, 3389). Avoid global VIP hostnames that hide LB. -
Pin by location (principle of locality)
DC1 apps → DC1 connector group. DC2 apps → DC2 connector group. If DC2 is just DR, leave it out of steady state. -
Keep a temporary safety net
A narrow wildcard tied to one small connector group for short-term migration/troubleshooting. Retire it once stable. -
Right-size health reporting
Shorter intervals (60–120s) for critical apps, longer for low-priority ones. Split GUI/API/RDP segments if needed. -
Watch the two dials
-
Per-connector concurrent sessions
-
Per-connector health-check cycle time
Add connectors or reduce scope if a group runs hot.
-
Migration Runbook
-
Inventory & prioritize: Export Top Apps by Bandwidth. For each: FQDNs, ports, home DC, failover model.
-
Map to DCs: Assign apps only to their home DC connector group.
-
Define specific segments: Replace wildcards with app-specific FQDNs and ports.
-
Assign cleanly: Only the connector group in the app’s home DC.
-
Order policies: Put specific app segments above legacy wildcard.
-
Plan DR: If app lives in DC2, create a separate DC2 segment but keep disabled until failover.
-
Cut over in waves: Top 5 apps first, then next 10–20. Validate health.
-
Observe & adjust: Right-size connector groups.
-
Retire the wildcard: Disable the catch-all segment once migration is complete.
Pro tip: Keep a short-lived rollback wildcard segment disabled. If a cutover breaks, toggle it on, fix, then try again.
Before vs After
Before: Wildcard Everywhere
High connector CPU, long health-check cycles, unpredictable brokering.
After: Specific & Local
Fewer checks, stable stickiness, predictable routing, easier troubleshooting.
FAQ
Q: If my wildcard allows “all ports except 53,” will ZPA test all those ports?
Yes. The connector probes every port you include in the app segment, regardless of which port the user actually needs.
Q: Can I keep F5 GTM in front of apps?
For non-ZPA clients, yes. For ZPA, no — avoid double LB.
Q: What’s a healthy target?
Keep each connector to ~6,000 checks per cycle or fewer and ensure session concurrency stays comfortably below limits. Scale out by adding connectors.
Appendix: App Inventory Template
App / Owner | FQDN(s) | Ports (TCP/UDP) | Home DC | Connector Group | Health Interval | DR / Failover Notes |
---|---|---|---|---|---|---|
ERP | erp.dc1.corp.local | TCP 443, 1521 | DC1 | CG-DC1-Core | 60s | Enable DC2 segment during DR only |
RDP Farm | rdp.dc2.corp.local | TCP 3389 | DC2 | CG-DC2-Core | 120s | GUI only; no UDP 3389 required |