From MPLS to SSE/SASE: Understanding the Loss of Network Control






The Internet Visibility Gap: Understanding Accountability Constraints in Cloud Service Delivery


Executive Summary

As organizations rapidly migrate to cloud-based services, a fundamental shift in network architecture is creating a critical visibility and accountability gap. Traditional private MPLS networks provided end-to-end service level agreements (SLAs), clear ownership boundaries, and deterministic troubleshooting paths. The move to public internet-based cloud services—especially internet-delivered security stacks such as SSE/SASE—reduces these guarantees, leaving organizations with limited diagnostic authority and fewer enforceable remedies when performance degrades.

This document examines the technical and operational implications of this architectural shift, focusing on the loss of network visibility, the limitations of current troubleshooting methodologies, and the contrast between public internet delivery versus private connectivity models that restore clearer accountability boundaries.

The Traditional Network Model: Accountability Through Ownership

MPLS Architecture and SLA Guarantees

In traditional enterprise networks, MPLS circuits provided:

  • End-to-end SLAs covering uptime, latency, jitter, and packet loss (as defined by the provider contract)
  • Single provider accountability for the managed path between endpoints
  • Private infrastructure where the provider maintains visibility into intermediate hops
  • Clear demarcation points establishing where customer responsibility ends and provider responsibility begins
  • More predictable routing with fewer unknown transit networks in the delivery path
Example Scenario: A 23-hospital health system with dual MPLS providers and diverse paths. When issues occurred, troubleshooting followed a binary model: either the problem existed within the customer’s network infrastructure or within the provider-managed MPLS network. The provider had visibility into their segment, access to intermediate devices, and contractual obligations tied to defined performance metrics.

The Troubleshooting Advantage

When a performance issue occurred on MPLS:

  1. Clear SLA metrics identified when thresholds were breached
  2. Provider had monitoring and diagnostic access to their managed segment
  3. Ticket escalation paths were defined and contractually bounded
  4. Root cause analysis could often identify the responsible segment with high confidence
  5. Remediation timelines were driven by contractual commitments and penalties

The Internet-Based Cloud Model: The Accountability Void

Architecture of Modern Cloud Services

Services like Microsoft 365, internet-delivered security platforms (SSE/SASE), and other SaaS applications route traffic across the public internet, introducing:

  • Multiple autonomous systems (AS) with independent policies and peering constraints
  • Dynamic routing that can change based on BGP decisions, congestion, and peering agreements
  • Tier 1, 2, and 3 ISP hand-offs where packets traverse multiple autonomous systems
  • No enforceable end-to-end SLA across the entire multi-AS path (beyond what your own ISP contracts cover)
  • Limited visibility into intermediate hops and routing decisions outside the networks you directly contract with
Critical Reality: Once packets leave your directly contracted provider network and traverse the broader public internet, you typically have:

  • No direct contractual relationship with intermediate carriers
  • No guaranteed ability to open support tickets with every transit provider involved
  • No enforceable end-to-end SLA guarantees for latency, jitter, or packet loss across the full path
  • Limited insight into routing decisions made by other autonomous systems
  • An indirect escalation path where remediation is often “best effort” and timeline outcomes are uncertain

The SSE/SASE Use Case (Cloud Security as an Example)

SSE/SASE security stacks (SWG, CASB, ZTNA, DLP, firewall-as-a-service) commonly operate as cloud enforcement points that user or site traffic must reach before accessing SaaS applications or internal resources. Consider the path complexity:

End User / Branch → Local ISP → Multiple Transit/Peering Networks →
ISP Point of Presence (PoP) →
[Return/Forward Path May Differ] →
Destination (SaaS Application or Internal Resource)

Each segment introduces variables:

  • Peering relationship quality and capacity
  • Congestion during peak hours
  • Route instability and BGP policy shifts
  • Provider-specific traffic shaping behaviors
  • Geographic routing inefficiencies or suboptimal PoP selection

When thousands of users experience degraded performance through an internet-delivered security stack, you often cannot directly:

  • Engage every intermediate ISP for investigation (you are not their customer)
  • Access routing tables or traffic statistics from all transit providers
  • Enforce remediation on networks you do not contract with
  • Guarantee the path will remain consistent across subsequent connections

Current Troubleshooting Approaches and Their Limitations

MTR (My Traceroute) – Limited Diagnostic Value

MTR combines traceroute and ping functionality to show packet loss and latency per hop. However:

Limitations:

  • ICMP deprioritization: Many networks rate-limit or deprioritize ICMP, making results unreliable for user traffic impact
  • Asymmetric routing: Return paths differ from forward paths; MTR typically shows only one direction
  • No context for packet loss: A hop showing loss may be ICMP-specific and not reflect actual application traffic loss
  • No direct remediation authority: Identifying a problematic hop in a third-party AS often yields no direct path to a fix
  • Dynamic routing: The tested path can differ from production paths minutes later
Example: MTR shows 15% packet loss at hop 7 (an intermediate provider). You often have:

  • No relationship with that provider
  • No proof that production TCP/UDP flows experience the same loss
  • No guaranteed ability to influence routing to avoid that hop
  • No enforceable escalation mechanism to force remediation

ThousandEyes – Visibility Without Authority

Distributed monitoring platforms can provide global visibility into ISP outages, BGP events, and performance degradation from multiple vantage points.

Limitations:

  • Detection without guaranteed resolution: Seeing elevated latency or loss doesn’t guarantee a corrective action path
  • Correlation challenges: An ISP incident may not map cleanly to your specific user paths or apps
  • Coverage gaps: Not every peering point or last-mile ISP is measured in every geography
  • No enforcement mechanism: Identifying the likely root network does not grant control over that network
  • Cost vs. value risk: Monitoring can confirm issues you may only be able to mitigate indirectly
Critical Question: If monitoring identifies an intermediate provider as the likely contributor to your slowness, what is your next action? You often cannot directly:

  • Open a ticket with that provider (not a customer)
  • Force your ISP to permanently avoid specific transit networks
  • Guarantee your traffic won’t route through that network tomorrow
  • Hold every segment accountable for end-to-end SLA outcomes

The Diagnostic Dead End

The typical troubleshooting workflow reveals the constraint:

  1. Users report slowness accessing apps through an internet-delivered security stack
  2. Run MTR from multiple locations – identifies a latency spike at an intermediate hop
  3. Check distributed monitoring – confirms elevated latency in that region/provider
  4. Contact your ISP – they confirm the issue appears downstream, outside their directly managed network
  5. Contact the security/SaaS vendor – they confirm their infrastructure is healthy and/or traffic arrives degraded upstream
  6. Result: Likely problem segment identified, but remediation is indirect, authority is limited, and resolution timelines are uncertain

The Private Cloud Alternative: Maintaining Visibility and Control

ExpressRoute (Azure) and Direct Connect (AWS)

Private connectivity can restore clearer accountability boundaries and more deterministic paths for defined workloads:

Architecture:

  • Dedicated circuits from your location to a colocation facility
  • Cross-connects to cloud provider infrastructure within the same facility
  • Reduced internet traversal for production traffic destined to supported cloud edges
  • Provider SLAs covering defined segments (as contractually defined)
  • More predictable routing with fewer unknown transit networks

Letter of Authorization/Connecting Facility Assignment (LOA-CFA):

When establishing ExpressRoute or Direct Connect, the cloud provider issues an LOA-CFA document containing:

  • Authorization for your provider to connect equipment on your behalf
  • Specific port locations (cage, cabinet, patch panel identifiers)
  • Media type requirements (single-mode fiber, copper)
  • Technical specifications for the cross-connect
  • Circuit identifiers linking the physical connection to your virtual circuits

Restored Accountability Model

With private connectivity:

Clear ownership boundaries:

  • Customer responsibility: On-premises equipment to demarcation point
  • Carrier responsibility: Demarcation point to colocation facility/cross-connect segment (per contract)
  • Cloud provider responsibility: Cross-connect to regional cloud infrastructure (per provider SLA)

Troubleshooting advantages:

  • Each party has better diagnostic access to their segment
  • SLAs define performance metrics and remediation processes for covered segments
  • Support escalation paths are defined in contracts and provider support models
  • Root cause identification can follow clearer demarcation logic
Example scenario: Private cloud connectivity performance degrades. Within 30 minutes:

  1. Cloud monitoring indicates the issue aligns to a carrier circuit segment
  2. Ticket opened with the carrier per the connectivity support agreement
  3. Carrier confirms physical degradation in their infrastructure
  4. Remediation initiated with a committed support workflow
  5. Performance restored and root cause documentation provided

Hybrid Architecture Considerations

Organizations can implement hybrid models:

Private connectivity for:

  • Business-critical SaaS and cloud workloads that support private on-ramps
  • Internal applications and data center resources
  • Voice and video conferencing (latency-sensitive)
  • Healthcare/financial applications requiring compliance

Public internet for:

  • General web browsing
  • Non-critical SaaS applications
  • Guest/BYOD traffic

This approach maintains clearer control where it matters most while accepting best-effort limitations where it is appropriate.

The SSE/SASE Dilemma: Security at the Cost of Determinism

Architectural Challenge

Internet-delivered security architectures route user and site traffic through cloud enforcement points. In many common deployments, reaching those enforcement points involves best-effort internet transit for at least part of the path. While some organizations can reduce exposure to uncontrolled transit for specific on-ramps (e.g., via colocation ecosystems and private interconnect constructs), this does not eliminate best-effort segments for all users—especially remote/home ISP users and mobile users.

For organizations with MPLS networks:

  • Cloud security enforcement often requires an internet on-ramp before reaching provider PoPs
  • This creates a parallel delivery model: private connectivity for internal paths, best-effort internet for cloud security insertion and many SaaS paths
  • The network team loses deterministic control over a critical component of the application delivery chain

For distributed workforces:

  • Remote users connect via home ISPs of varying quality and policies
  • Traffic routes through multiple ISPs before reaching cloud security enforcement points
  • Each user location introduces unique routing paths and potential failure modes
  • Troubleshooting becomes location-specific with limited uniform remediation options

Private Peering Options for SSE/SASE Platforms

Many SSE/SASE vendors offer private connectivity alternatives that can restore MPLS-like characteristics for site-to-enforcement-point traffic. These options typically involve colocation facilities and direct interconnect models:

Common Private Connectivity Models:

  • Colocation fabric interconnects (Equinix Fabric, Megaport, PacketFabric, CoreSite) providing direct Layer 2/3 connections to security vendor PoPs
  • Cloud exchange platforms where enterprises and vendors meet at common peering points
  • Direct cross-connects within shared data center facilities
  • Vendor-specific programs (e.g., Zscaler Cloud Connector, Netskope Private Access via colocation partners) designed to bypass public internet for site connectivity

What Private Peering Restores:

  • Clearer SLA boundaries between customer circuit, colocation provider, and security vendor infrastructure
  • Reduced AS hops by eliminating best-effort internet transit for site-to-PoP segments
  • More predictable routing with deterministic paths over private infrastructure
  • Direct troubleshooting paths with each segment owner (circuit provider, colo facility, security vendor)
  • Performance consistency for office/branch locations with private connectivity
Example Architecture: An enterprise with 50 branch offices establishes MPLS circuits to a primary colocation facility. Within that facility, they deploy cross-connects via Equinix Fabric to their SSE/SASE vendor’s enforcement points. Branch-to-PoP traffic now flows over private infrastructure with clearer accountability:

  • Branch to colo: Carrier MPLS SLA
  • Colo cross-connect: Equinix SLA
  • PoP to cloud apps: Vendor SLA + destination provider paths

What Private Peering Does NOT Solve:

  • Remote/mobile workforce connectivity: Home ISP users and mobile workers still traverse best-effort internet paths to reach enforcement points—this is often 40-60% of enterprise users today
  • Last-mile variability: Even with private peering to PoPs, traffic destined to SaaS applications still traverses internet paths from the PoP to the destination
  • Geographic distribution challenges: Not all user locations can economically reach colocation facilities; distant users may still hairpin through internet paths
  • Cost and complexity: Private connectivity requires colocation presence, cross-connect fees, and increased circuit costs—prohibitive for some deployment scales
  • Operational overhead: Managing hybrid connectivity models (private for sites, internet for remote users) increases architecture complexity
The Distributed Workforce Reality: While private peering solves site connectivity challenges, the modern enterprise reality is that 40-60% of users are remote, mobile, or work-from-home. These users remain subject to:

  • Home ISP quality variations (cable, DSL, fiber, 5G with vastly different characteristics)
  • Best-effort internet routing to reach enforcement points
  • No enforceable SLA coverage for their access paths
  • Geographic inconsistency (users in different regions experience different transit networks)

Result: Even with significant investment in private peering for sites, a substantial portion of your user base still experiences the internet visibility gap described throughout this document. This creates a two-tier troubleshooting reality: deterministic paths for sites, best-effort diagnostics for remote workers.

Cost-Benefit Considerations:

  • When private peering makes sense: Large site deployments, compliance-driven environments, latency-sensitive applications, predictable traffic patterns, and budgets that support colocation infrastructure
  • When internet delivery is acceptable: Distributed workforces, small/medium site counts, cost-constrained deployments, and applications tolerant of variable performance
  • Hybrid models: Many enterprises use private connectivity for headquarters and major branches while accepting internet delivery for smaller sites and remote users

The Hybrid Architecture Reality

Most modern enterprises operate hybrid connectivity models that combine private and public paths:

Typical deployment pattern:

  • Site traffic (30-50% of users): Private peering via colocation facilities to enforcement points, restoring deterministic performance and clearer SLA boundaries
  • Remote/mobile traffic (40-60% of users): Best-effort internet transit from home ISPs and mobile carriers, subject to the visibility gaps described throughout this document
  • SaaS-bound traffic (all users): Even after security inspection, traffic to SaaS destinations often traverses internet paths with limited visibility

This creates operational complexity:

  • Two troubleshooting methodologies: Deterministic root cause for site traffic, best-effort diagnostics for remote users
  • Inconsistent user experiences: Office users may have reliable performance while remote users experience variability
  • Communication challenges: Explaining to leadership why “the same application” performs differently for different user populations
  • Monitoring complexity: Tracking baselines and SLAs requires segmenting metrics by connectivity model
  • Incident management: Distinguishing between “actionable with authority” vs. “best-effort mitigation” incidents

The Support Escalation Problem

When users report performance issues:

What you can do:

  • Verify your contracted circuits are performing within SLA where applicable
  • Confirm DNS resolution and endpoint posture flows are functioning
  • Validate local network configurations, tunnel health, MTU correctness, and capacity headroom

What you often cannot do directly:

  • Diagnose or remediate issues in every upstream transit network between users and the provider
  • Force routing changes across networks you do not control
  • Hold intermediate networks accountable to your end-to-end performance expectations
  • Guarantee consistent outcomes across different user geographies and ISPs

Vendor limitations:

  • They can confirm their infrastructure and PoPs are healthy
  • They can observe when traffic arrives degraded at their edges
  • They cannot fully control upstream internet routing decisions across third-party networks
  • They cannot provide enforceable SLAs covering customer ISP segments and uncontrolled internet transit

Cloud Security (SSE/SASE) Mitigations (practical levers you can actually pull):

  • Control PoP selection where possible: Prefer the nearest/most stable enforcement points per region and avoid unnecessary PoP drift that introduces new transit paths
  • Engineer egress diversity: If you have multiple DIA providers, ensure security-bound traffic can steer to the better-performing provider during incidents
  • Build tunnel redundancy and health criteria: Use redundant GRE/IPsec paths with explicit failover triggers (latency/jitter/loss thresholds, not just “tunnel up/down”)
  • Validate MTU and fragmentation behavior end-to-end: Many “mysterious slowness” incidents trace back to PMTUD/MTU issues across tunnels and intermediate networks
  • Baseline enforcement point performance by geography: Track DNS time, TLS handshake time, TTFB, and throughput so “internet variability” is measurable
  • Use application-layer synthetic testing: Measure real login flows and critical SaaS transactions through the security stack, not only ICMP/MTR outputs
  • Create an emergency bypass policy (pre-approved): For defined critical apps, establish a documented, security-approved temporary bypass with scope, logging, and rollback
  • Instrument endpoint and client behavior: Confirm forwarding mode, PAC/proxy behaviors, certificate inspection impacts, and endpoint resource constraints aren’t the root cause
  • Standardize an escalation evidence bundle: Include impacted regions, selected enforcement point, tunnel metrics, app timing breakdowns, and a known-good comparison path
  • Align with ISPs on routing influence options: Where supported, use provider features (communities / preferred peer routing / managed reroutes) to avoid chronic congestion points

Risk Acceptance vs. Risk Management

Deploying internet-delivered services requires accepting that:

  • Network performance is variable and can be influenced by factors outside your control
  • Troubleshooting authority is constrained to the segments you own or contractually cover
  • User experience will vary based on geography, last-mile ISP quality, and transit path selection
  • Enforceable end-to-end SLAs are limited when the path includes networks you do not contract with
  • Resolution timelines can be uncertain when upstream routing or peering issues occur

This isn’t a criticism of any single vendor—it is the operational reality of delivering applications and security over best-effort internet transit when private connectivity is not end-to-end.

Recommendations and Strategic Considerations

For Organizations Planning Cloud Migration

Conduct realistic assessments (measure the real path, not the ideal path):

  1. Map user-to-enforcement-point and user-to-app paths by region and access method (office, remote, mobile)
  2. Test from actual user vantage points using application-layer synthetic transactions, not only ICMP-based tools
  3. Establish baselines per geography (DNS time, TLS handshake, TTFB, throughput) before migrations
  4. Define “good enough” thresholds and document which parts are best-effort vs. contract-backed

Evaluate connectivity and egress models (design for steering, not hope):

  • Multi-provider DIA with deliberate egress policy and tested failover behavior
  • Private connectivity (ExpressRoute/Direct Connect, Equinix Fabric, direct peering) for workloads with strict performance needs and sufficient user concentration
  • Regional breakout strategies to reduce unnecessary internet AS hops
  • Hybrid architectures that apply deterministic connectivity where it matters most

Set realistic expectations:

  • Acknowledge authority limits to leadership and document as accepted risk where appropriate
  • Redefine SLAs to reflect the portions of the network you control and contractually cover
  • Adjust troubleshooting processes to focus on actionable segments and mitigations
  • Plan for “best-effort transit variability” as a valid root cause category

For Troubleshooting Teams

Focus efforts where you have control:

  • Your circuits and edge: Ensure they meet SLA, are properly sized, and have headroom
  • Local infrastructure: Eliminate internal bottlenecks and configuration errors
  • DNS and routing policies: Optimize resolvers, split-horizon behavior, and egress selection where possible
  • Endpoint health: Verify devices, posture, and client behaviors aren’t contributing to slowness

Accept diagnostic limitations (and collect better evidence):

  • MTR and traceroute indicate paths but rarely provide remediation authority
  • Distributed monitoring provides visibility but does not guarantee fixes
  • ISP support typically ends at their boundary; downstream remediation is indirect
  • Vendor support can confirm edge health but cannot control upstream transit networks

Develop response playbooks:

  • Document known-good baselines for comparison during incidents
  • Create stakeholder communication templates explaining best-effort transit limitations clearly
  • Establish escalation criteria distinguishing actionable issues from upstream variability
  • Maintain ISP relationships to influence what routing you can within contracted networks

For Leadership and Decision Makers

Understand the trade-offs:

  • Cloud services provide scalability, reduced capital expenditure, and operational flexibility
  • Internet delivery introduces variability, indirect escalation, and limited end-to-end authority
  • This is not a failure of any vendor—it’s an inherent characteristic of multi-AS best-effort transit

Budget and resource implications:

  • Monitoring tools provide visibility but do not create remediation authority
  • Private connectivity restores clearer control but requires investment and operational planning
  • Support training must shift from “fix everything” to “optimize what we control and mitigate what we can”
  • User expectations must be managed; not all performance issues have deterministic fixes

Risk acceptance framework:

  • Document best-effort segments as known limitations in your service delivery model
  • Establish performance thresholds that account for geographic and ISP variability
  • Create incident procedures differentiating controllable vs. uncontrollable contributors
  • Communicate limitations to stakeholders before incidents occur

Conclusion

The migration to cloud-based services represents a fundamental architectural shift that extends beyond application hosting. Organizations are replacing private, controlled delivery paths with shared public infrastructure—and with that shift comes reduced determinism, limited authority over intermediate networks, and weaker end-to-end accountability models.

For internet-delivered services—including SSE/SASE and many SaaS platforms—enterprises must accept that:

  • End-to-end enforceable SLAs are limited when the path includes networks you don’t contract with
  • Troubleshooting can identify contributors you cannot directly remediate
  • Performance can vary based on factors outside your influence
  • “Best-effort transit variability” is a valid root cause category, even if it’s unsatisfying

Private connectivity options through colocation providers like Equinix can restore deterministic performance for site traffic, but the distributed workforce—now representing 40-60% of enterprise users—remains subject to best-effort internet constraints. This creates a two-tier operational reality that requires different troubleshooting approaches, monitoring strategies, and stakeholder expectations.

This isn’t an argument against cloud adoption—it’s a call for realistic assessment of what changes when you move from private connectivity models to internet delivery. Organizations that understand these constraints can:

  • Make informed decisions about which workloads fit which connectivity models
  • Set appropriate expectations with stakeholders and users
  • Invest in monitoring and telemetry that leads to actionable mitigations
  • Maintain private connectivity for truly critical applications where deterministic performance is required
  • Accept best-effort variability as an operational reality for distributed workforces

The visibility and accountability gap is real. Acknowledging it is the first step toward making strategic decisions that balance cloud benefits against the operational realities of internet-based delivery.