From MPLS to SSE/SASE: Understanding the Loss of Network Control

The Internet Visibility Gap: Understanding Accountability Constraints in Cloud Service Delivery

Executive Summary

As organizations rapidly migrate to cloud-based services, a fundamental shift in network architecture is creating a critical visibility and accountability gap. Traditional private MPLS networks provided end-to-end service level agreements (SLAs), clear ownership boundaries, and deterministic troubleshooting paths. The move to public internet-based cloud services—especially internet-delivered security stacks such as SSE/SASE—reduces these guarantees, leaving organizations with limited diagnostic authority and fewer enforceable remedies when performance degrades.

This document examines the technical and operational implications of this architectural shift, focusing on the loss of network visibility, the limitations of current troubleshooting methodologies, and the contrast between public internet delivery versus private connectivity models that restore clearer accountability boundaries.

The Traditional Network Model: Accountability Through Ownership

MPLS Architecture and SLA Guarantees

In traditional enterprise networks, MPLS circuits provided:

End-to-end SLAs covering uptime, latency, jitter, and packet loss (as defined by the provider contract)
Single provider accountability for the managed path between endpoints
Private infrastructure where the provider maintains visibility into intermediate hops
Clear demarcation points establishing where customer responsibility ends and provider responsibility begins
More predictable routing with fewer unknown transit networks in the delivery path

Example Scenario: A 23-hospital health system with dual MPLS providers and diverse paths. When issues occurred, troubleshooting followed a binary model: either the problem existed within the customer’s network infrastructure or within the provider-managed MPLS network. The provider had visibility into their segment, access to intermediate devices, and contractual obligations tied to defined performance metrics.

The Troubleshooting Advantage

When a performance issue occurred on MPLS:

Clear SLA metrics identified when thresholds were breached
Provider had monitoring and diagnostic access to their managed segment
Ticket escalation paths were defined and contractually bounded
Root cause analysis could often identify the responsible segment with high confidence
Remediation timelines were driven by contractual commitments and penalties

The Internet-Based Cloud Model: The Accountability Void

Architecture of Modern Cloud Services

Services like Microsoft 365, internet-delivered security platforms (SSE/SASE), and other SaaS applications route traffic across the public internet, introducing:

Multiple autonomous systems (AS) with independent policies and peering constraints
Dynamic routing that can change based on BGP decisions, congestion, and peering agreements
Tier 1, 2, and 3 ISP hand-offs where packets traverse multiple autonomous systems
No enforceable end-to-end SLA across the entire multi-AS path (beyond what your own ISP contracts cover)
Limited visibility into intermediate hops and routing decisions outside the networks you directly contract with

Critical Reality: Once packets leave your directly contracted provider network and traverse the broader public internet, you typically have:

No direct contractual relationship with intermediate carriers
No guaranteed ability to open support tickets with every transit provider involved
No enforceable end-to-end SLA guarantees for latency, jitter, or packet loss across the full path
Limited insight into routing decisions made by other autonomous systems
An indirect escalation path where remediation is often “best effort” and timeline outcomes are uncertain

The SSE/SASE Use Case (Cloud Security as an Example)

SSE/SASE security stacks (SWG, CASB, ZTNA, DLP, firewall-as-a-service) commonly operate as cloud enforcement points that user or site traffic must reach before accessing SaaS applications or internal resources. Consider the path complexity:

End User / Branch → Local ISP → Multiple Transit/Peering Networks →

ISP Point of Presence (PoP) →

[Return/Forward Path May Differ] →

Destination (SaaS Application or Internal Resource)

Each segment introduces variables:

Peering relationship quality and capacity
Congestion during peak hours
Route instability and BGP policy shifts
Provider-specific traffic shaping behaviors
Geographic routing inefficiencies or suboptimal PoP selection

When thousands of users experience degraded performance through an internet-delivered security stack, you often cannot directly:

Engage every intermediate ISP for investigation (you are not their customer)
Access routing tables or traffic statistics from all transit providers
Enforce remediation on networks you do not contract with
Guarantee the path will remain consistent across subsequent connections

Current Troubleshooting Approaches and Their Limitations

MTR (My Traceroute) – Limited Diagnostic Value

MTR combines traceroute and ping functionality to show packet loss and latency per hop. However:

Limitations:

ICMP deprioritization: Many networks rate-limit or deprioritize ICMP, making results unreliable for user traffic impact
Asymmetric routing: Return paths differ from forward paths; MTR typically shows only one direction
No context for packet loss: A hop showing loss may be ICMP-specific and not reflect actual application traffic loss
No direct remediation authority: Identifying a problematic hop in a third-party AS often yields no direct path to a fix
Dynamic routing: The tested path can differ from production paths minutes later

Example: MTR shows 15% packet loss at hop 7 (an intermediate provider). You often have:

No relationship with that provider
No proof that production TCP/UDP flows experience the same loss
No guaranteed ability to influence routing to avoid that hop
No enforceable escalation mechanism to force remediation

ThousandEyes – Visibility Without Authority

Distributed monitoring platforms can provide global visibility into ISP outages, BGP events, and performance degradation from multiple vantage points.

Limitations:

Detection without guaranteed resolution: Seeing elevated latency or loss doesn’t guarantee a corrective action path
Correlation challenges: An ISP incident may not map cleanly to your specific user paths or apps
Coverage gaps: Not every peering point or last-mile ISP is measured in every geography
No enforcement mechanism: Identifying the likely root network does not grant control over that network
Cost vs. value risk: Monitoring can confirm issues you may only be able to mitigate indirectly

Critical Question: If monitoring identifies an intermediate provider as the likely contributor to your slowness, what is your next action? You often cannot directly:

Open a ticket with that provider (not a customer)
Force your ISP to permanently avoid specific transit networks
Guarantee your traffic won’t route through that network tomorrow
Hold every segment accountable for end-to-end SLA outcomes

The Diagnostic Dead End

The typical troubleshooting workflow reveals the constraint:

Users report slowness accessing apps through an internet-delivered security stack
Run MTR from multiple locations – identifies a latency spike at an intermediate hop
Check distributed monitoring – confirms elevated latency in that region/provider
Contact your ISP – they confirm the issue appears downstream, outside their directly managed network
Contact the security/SaaS vendor – they confirm their infrastructure is healthy and/or traffic arrives degraded upstream
Result: Likely problem segment identified, but remediation is indirect, authority is limited, and resolution timelines are uncertain

The Private Cloud Alternative: Maintaining Visibility and Control

ExpressRoute (Azure) and Direct Connect (AWS)

Private connectivity can restore clearer accountability boundaries and more deterministic paths for defined workloads:

Architecture:

Dedicated circuits from your location to a colocation facility
Cross-connects to cloud provider infrastructure within the same facility
Reduced internet traversal for production traffic destined to supported cloud edges
Provider SLAs covering defined segments (as contractually defined)
More predictable routing with fewer unknown transit networks

Letter of Authorization/Connecting Facility Assignment (LOA-CFA):

When establishing ExpressRoute or Direct Connect, the cloud provider issues an LOA-CFA document containing:

Authorization for your provider to connect equipment on your behalf
Specific port locations (cage, cabinet, patch panel identifiers)
Media type requirements (single-mode fiber, copper)
Technical specifications for the cross-connect
Circuit identifiers linking the physical connection to your virtual circuits

Restored Accountability Model

With private connectivity:

Clear ownership boundaries:

Customer responsibility: On-premises equipment to demarcation point
Carrier responsibility: Demarcation point to colocation facility/cross-connect segment (per contract)
Cloud provider responsibility: Cross-connect to regional cloud infrastructure (per provider SLA)

Troubleshooting advantages:

Each party has better diagnostic access to their segment
SLAs define performance metrics and remediation processes for covered segments
Support escalation paths are defined in contracts and provider support models
Root cause identification can follow clearer demarcation logic

Example scenario: Private cloud connectivity performance degrades. Within 30 minutes:

Cloud monitoring indicates the issue aligns to a carrier circuit segment
Ticket opened with the carrier per the connectivity support agreement
Carrier confirms physical degradation in their infrastructure
Remediation initiated with a committed support workflow
Performance restored and root cause documentation provided

Hybrid Architecture Considerations

Organizations can implement hybrid models:

Private connectivity for:

Business-critical SaaS and cloud workloads that support private on-ramps
Internal applications and data center resources
Voice and video conferencing (latency-sensitive)
Healthcare/financial applications requiring compliance

Public internet for:

General web browsing
Non-critical SaaS applications
Guest/BYOD traffic

This approach maintains clearer control where it matters most while accepting best-effort limitations where it is appropriate.

The SSE/SASE Dilemma: Security at the Cost of Determinism

Architectural Challenge

Internet-delivered security architectures route user and site traffic through cloud enforcement points. In many common deployments, reaching those enforcement points involves best-effort internet transit for at least part of the path. While some organizations can reduce exposure to uncontrolled transit for specific on-ramps (e.g., via colocation ecosystems and private interconnect constructs), this does not eliminate best-effort segments for all users—especially remote/home ISP users and mobile users.

For organizations with MPLS networks:

Cloud security enforcement often requires an internet on-ramp before reaching provider PoPs
This creates a parallel delivery model: private connectivity for internal paths, best-effort internet for cloud security insertion and many SaaS paths
The network team loses deterministic control over a critical component of the application delivery chain

For distributed workforces:

Remote users connect via home ISPs of varying quality and policies
Traffic routes through multiple ISPs before reaching cloud security enforcement points
Each user location introduces unique routing paths and potential failure modes
Troubleshooting becomes location-specific with limited uniform remediation options

Private Peering Options for SSE/SASE Platforms

Many SSE/SASE vendors offer private connectivity alternatives that can restore MPLS-like characteristics for site-to-enforcement-point traffic. These options typically involve colocation facilities and direct interconnect models:

Common Private Connectivity Models:

Colocation fabric interconnects (Equinix Fabric, Megaport, PacketFabric, CoreSite) providing direct Layer 2/3 connections to security vendor PoPs
Cloud exchange platforms where enterprises and vendors meet at common peering points
Direct cross-connects within shared data center facilities
Vendor-specific programs (e.g., Zscaler Cloud Connector, Netskope Private Access via colocation partners) designed to bypass public internet for site connectivity

What Private Peering Restores:

Clearer SLA boundaries between customer circuit, colocation provider, and security vendor infrastructure
Reduced AS hops by eliminating best-effort internet transit for site-to-PoP segments
More predictable routing with deterministic paths over private infrastructure
Direct troubleshooting paths with each segment owner (circuit provider, colo facility, security vendor)
Performance consistency for office/branch locations with private connectivity

Example Architecture: An enterprise with 50 branch offices establishes MPLS circuits to a primary colocation facility. Within that facility, they deploy cross-connects via Equinix Fabric to their SSE/SASE vendor’s enforcement points. Branch-to-PoP traffic now flows over private infrastructure with clearer accountability:

Branch to colo: Carrier MPLS SLA
Colo cross-connect: Equinix SLA
PoP to cloud apps: Vendor SLA + destination provider paths

What Private Peering Does NOT Solve:

Remote/mobile workforce connectivity: Home ISP users and mobile workers still traverse best-effort internet paths to reach enforcement points—this is often 40-60% of enterprise users today
Last-mile variability: Even with private peering to PoPs, traffic destined to SaaS applications still traverses internet paths from the PoP to the destination
Geographic distribution challenges: Not all user locations can economically reach colocation facilities; distant users may still hairpin through internet paths
Cost and complexity: Private connectivity requires colocation presence, cross-connect fees, and increased circuit costs—prohibitive for some deployment scales
Operational overhead: Managing hybrid connectivity models (private for sites, internet for remote users) increases architecture complexity

The Distributed Workforce Reality: While private peering solves site connectivity challenges, the modern enterprise reality is that 40-60% of users are remote, mobile, or work-from-home. These users remain subject to:

Home ISP quality variations (cable, DSL, fiber, 5G with vastly different characteristics)
Best-effort internet routing to reach enforcement points
No enforceable SLA coverage for their access paths
Geographic inconsistency (users in different regions experience different transit networks)

Result: Even with significant investment in private peering for sites, a substantial portion of your user base still experiences the internet visibility gap described throughout this document. This creates a two-tier troubleshooting reality: deterministic paths for sites, best-effort diagnostics for remote workers.

Cost-Benefit Considerations:

When private peering makes sense: Large site deployments, compliance-driven environments, latency-sensitive applications, predictable traffic patterns, and budgets that support colocation infrastructure
When internet delivery is acceptable: Distributed workforces, small/medium site counts, cost-constrained deployments, and applications tolerant of variable performance
Hybrid models: Many enterprises use private connectivity for headquarters and major branches while accepting internet delivery for smaller sites and remote users

The Hybrid Architecture Reality

Most modern enterprises operate hybrid connectivity models that combine private and public paths:

Typical deployment pattern:

Site traffic (30-50% of users): Private peering via colocation facilities to enforcement points, restoring deterministic performance and clearer SLA boundaries
Remote/mobile traffic (40-60% of users): Best-effort internet transit from home ISPs and mobile carriers, subject to the visibility gaps described throughout this document
SaaS-bound traffic (all users): Even after security inspection, traffic to SaaS destinations often traverses internet paths with limited visibility

This creates operational complexity:

Two troubleshooting methodologies: Deterministic root cause for site traffic, best-effort diagnostics for remote users
Inconsistent user experiences: Office users may have reliable performance while remote users experience variability
Communication challenges: Explaining to leadership why “the same application” performs differently for different user populations
Monitoring complexity: Tracking baselines and SLAs requires segmenting metrics by connectivity model
Incident management: Distinguishing between “actionable with authority” vs. “best-effort mitigation” incidents

The Support Escalation Problem

When users report performance issues:

What you can do:

Verify your contracted circuits are performing within SLA where applicable
Confirm DNS resolution and endpoint posture flows are functioning
Validate local network configurations, tunnel health, MTU correctness, and capacity headroom

What you often cannot do directly:

Diagnose or remediate issues in every upstream transit network between users and the provider
Force routing changes across networks you do not control
Hold intermediate networks accountable to your end-to-end performance expectations
Guarantee consistent outcomes across different user geographies and ISPs

Vendor limitations:

They can confirm their infrastructure and PoPs are healthy
They can observe when traffic arrives degraded at their edges
They cannot fully control upstream internet routing decisions across third-party networks
They cannot provide enforceable SLAs covering customer ISP segments and uncontrolled internet transit

Cloud Security (SSE/SASE) Mitigations (practical levers you can actually pull):

Control PoP selection where possible: Prefer the nearest/most stable enforcement points per region and avoid unnecessary PoP drift that introduces new transit paths
Engineer egress diversity: If you have multiple DIA providers, ensure security-bound traffic can steer to the better-performing provider during incidents
Build tunnel redundancy and health criteria: Use redundant GRE/IPsec paths with explicit failover triggers (latency/jitter/loss thresholds, not just “tunnel up/down”)
Validate MTU and fragmentation behavior end-to-end: Many “mysterious slowness” incidents trace back to PMTUD/MTU issues across tunnels and intermediate networks
Baseline enforcement point performance by geography: Track DNS time, TLS handshake time, TTFB, and throughput so “internet variability” is measurable
Use application-layer synthetic testing: Measure real login flows and critical SaaS transactions through the security stack, not only ICMP/MTR outputs
Create an emergency bypass policy (pre-approved): For defined critical apps, establish a documented, security-approved temporary bypass with scope, logging, and rollback
Instrument endpoint and client behavior: Confirm forwarding mode, PAC/proxy behaviors, certificate inspection impacts, and endpoint resource constraints aren’t the root cause
Standardize an escalation evidence bundle: Include impacted regions, selected enforcement point, tunnel metrics, app timing breakdowns, and a known-good comparison path
Align with ISPs on routing influence options: Where supported, use provider features (communities / preferred peer routing / managed reroutes) to avoid chronic congestion points

Risk Acceptance vs. Risk Management

Deploying internet-delivered services requires accepting that:

Network performance is variable and can be influenced by factors outside your control
Troubleshooting authority is constrained to the segments you own or contractually cover
User experience will vary based on geography, last-mile ISP quality, and transit path selection
Enforceable end-to-end SLAs are limited when the path includes networks you do not contract with
Resolution timelines can be uncertain when upstream routing or peering issues occur

This isn’t a criticism of any single vendor—it is the operational reality of delivering applications and security over best-effort internet transit when private connectivity is not end-to-end.

Recommendations and Strategic Considerations

For Organizations Planning Cloud Migration

Conduct realistic assessments (measure the real path, not the ideal path):

Map user-to-enforcement-point and user-to-app paths by region and access method (office, remote, mobile)
Test from actual user vantage points using application-layer synthetic transactions, not only ICMP-based tools
Establish baselines per geography (DNS time, TLS handshake, TTFB, throughput) before migrations
Define “good enough” thresholds and document which parts are best-effort vs. contract-backed

Evaluate connectivity and egress models (design for steering, not hope):

Multi-provider DIA with deliberate egress policy and tested failover behavior
Private connectivity (ExpressRoute/Direct Connect, Equinix Fabric, direct peering) for workloads with strict performance needs and sufficient user concentration
Regional breakout strategies to reduce unnecessary internet AS hops
Hybrid architectures that apply deterministic connectivity where it matters most

Set realistic expectations:

Acknowledge authority limits to leadership and document as accepted risk where appropriate
Redefine SLAs to reflect the portions of the network you control and contractually cover
Adjust troubleshooting processes to focus on actionable segments and mitigations
Plan for “best-effort transit variability” as a valid root cause category

For Troubleshooting Teams

Focus efforts where you have control:

Your circuits and edge: Ensure they meet SLA, are properly sized, and have headroom
Local infrastructure: Eliminate internal bottlenecks and configuration errors
DNS and routing policies: Optimize resolvers, split-horizon behavior, and egress selection where possible
Endpoint health: Verify devices, posture, and client behaviors aren’t contributing to slowness

Accept diagnostic limitations (and collect better evidence):

MTR and traceroute indicate paths but rarely provide remediation authority
Distributed monitoring provides visibility but does not guarantee fixes
ISP support typically ends at their boundary; downstream remediation is indirect
Vendor support can confirm edge health but cannot control upstream transit networks

Develop response playbooks:

Document known-good baselines for comparison during incidents
Create stakeholder communication templates explaining best-effort transit limitations clearly
Establish escalation criteria distinguishing actionable issues from upstream variability
Maintain ISP relationships to influence what routing you can within contracted networks

For Leadership and Decision Makers

Understand the trade-offs:

Cloud services provide scalability, reduced capital expenditure, and operational flexibility
Internet delivery introduces variability, indirect escalation, and limited end-to-end authority
This is not a failure of any vendor—it’s an inherent characteristic of multi-AS best-effort transit

Budget and resource implications:

Monitoring tools provide visibility but do not create remediation authority
Private connectivity restores clearer control but requires investment and operational planning
Support training must shift from “fix everything” to “optimize what we control and mitigate what we can”
User expectations must be managed; not all performance issues have deterministic fixes

Risk acceptance framework:

Document best-effort segments as known limitations in your service delivery model
Establish performance thresholds that account for geographic and ISP variability
Create incident procedures differentiating controllable vs. uncontrollable contributors
Communicate limitations to stakeholders before incidents occur

Conclusion

The migration to cloud-based services represents a fundamental architectural shift that extends beyond application hosting. Organizations are replacing private, controlled delivery paths with shared public infrastructure—and with that shift comes reduced determinism, limited authority over intermediate networks, and weaker end-to-end accountability models.

For internet-delivered services—including SSE/SASE and many SaaS platforms—enterprises must accept that:

End-to-end enforceable SLAs are limited when the path includes networks you don’t contract with
Troubleshooting can identify contributors you cannot directly remediate
Performance can vary based on factors outside your influence
“Best-effort transit variability” is a valid root cause category, even if it’s unsatisfying

Private connectivity options through colocation providers like Equinix can restore deterministic performance for site traffic, but the distributed workforce—now representing 40-60% of enterprise users—remains subject to best-effort internet constraints. This creates a two-tier operational reality that requires different troubleshooting approaches, monitoring strategies, and stakeholder expectations.

This isn’t an argument against cloud adoption—it’s a call for realistic assessment of what changes when you move from private connectivity models to internet delivery. Organizations that understand these constraints can:

Make informed decisions about which workloads fit which connectivity models
Set appropriate expectations with stakeholders and users
Invest in monitoring and telemetry that leads to actionable mitigations
Maintain private connectivity for truly critical applications where deterministic performance is required
Accept best-effort variability as an operational reality for distributed workforces

The visibility and accountability gap is real. Acknowledging it is the first step toward making strategic decisions that balance cloud benefits against the operational realities of internet-based delivery.