Design – Data Center Clos Architecture (Fabric Spine and Leaf)

What is Clos? and how does it relate to Cisco DC Design?

The Leaf-and-Spine architecture (also known as a Clos Network) is a specific design used within data centers, and it’s an integral part of Cisco’s Data Center networking solutions. Due to its non-blocking design, the architecture offers high levels of redundancy, scalability, and predictable latency.

Here’s a quick breakdown of the key concepts involved:

Leaf switches: These are the access layer switches that your servers directly connect to. Each server typically has dual connections to two different leaf switches for redundancy.

Spine switches: These are the backbone of the network. Every leaf switch is interconnected with each spine switch, creating a fabric of connections. This architecture allows traffic to rapidly move around the network – from one leaf switch to another – via the spine.

Clos Network: A Clos network is a type of non-blocking, multistage switching architecture that increases network scalability and performance, designed by Charles Clos in 1952. The Leaf-and-Spine design is an implementation of the Clos Network.

This design is widely used in modern data centers for several reasons:

  • Reduced latency: Every leaf switch is equidistant from every other leaf switch in the network because they are all directly connected to each spine switch. This design ensures that packets traverse the same number of hops, reducing network latency.
  • Increased bandwidth: The aggregate bandwidth of this architecture increases as you add more spine switches, making it a highly scalable solution.
  • Enhanced fault tolerance and resilience: The high degree of interconnection provides multiple paths between leaf switches, improving the network’s resilience and fault tolerance. If a spine switch fails or a link goes down, traffic can quickly be rerouted over another spine.
  • Ease of management and operation: All paths between devices are symmetric, which makes load balancing, traffic engineering, and troubleshooting simpler.

Cisco employs these concepts in its Data Center solutions, providing customers with a reliable, scalable, high-performance network infrastructure. As a part of this, it leverages advanced protocols and technologies such as VXLAN for network virtualization and EVPN for Layer 2 and Layer 3 connectivity, helping to optimize the functionality and efficiency of these architectures.

In essence, based on the Clos network design, the leaf-spine architecture is a modern, scalable, and reliable network topology that supports the high-bandwidth, low-latency needs of contemporary data centers. It underpins Cisco’s approach to data center networking and enables efficient network virtualization overlays like EVPN/VXLAN.



Underlay Network:

The underlay network refers to the physical infrastructure (routers, switches, cabling, etc.) that transports the actual network traffic. It is the foundational layer of your network that includes all the hardware and protocols needed for data transmission. This is what we typically think of when we imagine a network – a series of interconnected devices exchanging data with each other.

In the context of a data center, the underlay network would include the spine-and-leaf architecture, which enables the rapid and efficient transport of data packets between different devices.

Overlay Network:

On the other hand, an overlay network is a virtual network built on top of the underlay network. It allows for the creation of logical, software-based networks that can span the entire physical network. This abstraction layer simplifies the management of network resources and services and allows for greater flexibility and scalability.

Technologies such as Virtual Extensible LAN (VXLAN) or Ethernet VPN (EVPN) are examples of overlay networks, providing advanced functionalities like network virtualization and multi-tenancy.

Think of it this way – if the underlay network is like the physical roads connecting different cities (the devices), the overlay network is like the GPS that helps you navigate these roads most efficiently and can also create virtual pathways or tunnels (VXLAN) to reach distant cities more directly.

The underlay network refers to the physical infrastructure and protocols that make data transmission possible. In contrast, the overlay network is a virtual, software-defined layer that sits on top of the underlay network, allowing for advanced network functionalities and easier management.



IP Fabric Underlay Network

Three types of IP fabric designs can be implemented in a data center using Cisco devices:

1. 3-Stage IP Fabric: This model comprises two layers – the spine layer and the leaf layer. Devices in the spine layer are interconnected with those in the leaf layer, creating a network fabric. Any devices capable of functioning as a spine or leaf, like Cisco Nexus Series switches, can be used in this structure. This topology inherently reduces latency as every device is consistently separated by the same number of hops, enhancing fault tolerance and simplifying management.

2. 5-Stage IP Fabric: Essentially two 3-stage IP fabrics combined, this design is used when a network has expanded and is distributed across different Points of Delivery (PODs) in a data center. Here, an additional layer, the super spine layer, is introduced to facilitate communication between the two 3-stage IP fabrics. Devices like the Cisco Catalyst 9500 Series could serve as super spine devices in this model.

3. Collapsed Spine IP Fabric: Here, the function of the leaf layer is incorporated into the spine layer. This model could be used if you’re transitioning to an EVPN spine-and-leaf model or if your access or top-of-rack (TOR) devices are unable to function in a leaf layer due to incompatibility with EVPN-VXLAN. Cisco Nexus Series switches with adequate capacity could perform dual roles in this setup.

Devices in these designs are interconnected via high-speed interfaces. These could be individual links or aggregated Ethernet interfaces, which provide benefits such as increased bandwidth and link-level redundancy. Cisco’s EtherChannel can be used to combine multiple physical Ethernet links into one logical link.

In these architectures, EBGP (External Border Gateway Protocol) is utilized as the routing protocol for its scalability and reliability. Each device is assigned a unique autonomous system number to support EBGP. The underlay network typically uses IPv4 for simplified setup and management, but the overlay network can support both IPv4 and IPv6 traffic.

The framework also includes Micro Bidirectional Forwarding Detection (BFD) capability, which can quickly detect link failures on any member links in aggregated Ethernet bundles. Micro BFD sessions operate on the individual links of a link aggregation group (LAG) to improve failure detection time.

For overlay peering, it’s standard to use IBGP. This is generally easier as it does not require each pair of routers to have a unique AS number. However, EBGP can also be used for overlay peering depending on the specific network requirements and configurations.

To summarize, these principles can be applied to design Cisco networks that improve scalability, reliability, and performance. 



Network Virtualization Overlays

A network virtualization overlay is a method used to create virtual networks on top of a physical infrastructure known as the IP underlay network. This technology enables multiple tenant networks to exist concurrently on the same physical network, each with isolated traffic and separate control planes for improved security, privacy, and independent management.

In this context, a tenant refers to a distinct user community, such as a business unit, department, or application. Each tenant consists of groups of endpoints (devices, servers). These groups can communicate within their tenant network and interact with other tenants if the network policies allow it. A group is typically represented as a subnet (VLAN), and it communicates with external groups and endpoints via a Virtual Routing and Forwarding (VRF) instance.

In the overlay network, Ethernet bridging tables handle tenant-bridged frames, and IP routing tables process routed packets. Inter-VLAN routing is managed at the integrated routing and bridging (IRB) interfaces. Ethernet and IP tables are directed into virtual networks.

To facilitate communication between end systems connected to different VXLAN Tunnel Endpoint (VTEP) devices, tenant packets are encapsulated and sent over an EVPN-signalled VXLAN tunnel to the corresponding remote VTEP devices. When these tunneled packets reach the remote VTEP devices, they are de-encapsulated and then forwarded to the remote end systems via the respective bridging or routing tables of the egress VTEP device.

Cisco products, such as the Cisco Nexus Series switches and Cisco ACI, support such network virtualization overlays. With Cisco ACI, you can create multiple virtual networks (or VRFs) that are entirely isolated from each other, facilitating multi-tenancy in a network. This can enhance network efficiency, optimize resource utilization, and improve security by isolating network traffic.

Overlay Options

Here’s a brief overview:

1. IBGP for Overlays: Cisco supports IBGP for overlays. With Multiprotocol BGP (MP-BGP) support, Cisco devices can exchange Layer 2 and Layer 3 reachability information for VXLAN-based EVPN services.

2. Bridged Overlay: Cisco supports bridged overlays using technologies such as Virtual Extensible LAN (VXLAN).

3. Centrally Routed Bridging Overlay: This design is supported as part of the Cisco Application Centric Infrastructure (ACI) framework, which can perform both bridging and routing functions at the spine layer.

4. Edge-Routed Bridging Overlay: This design is also supported in the Cisco ACI framework, where bridging and routing functions are performed at the leaf layer.

5. Collapsed Spine Overlay: This is typically supported in smaller environments where the spine and leaf architecture are collapsed into a single tier.

6. IRB Addressing Models in Bridging Overlays: Integrated Routing and Bridging (IRB) is a technique that Cisco devices can use to route traffic between VLANs in a bridging overlay.

7. Routed Overlay using EVPN Type 5 Routes: This is supported by Cisco devices, allowing for IP prefix advertisement in EVPN networks.

8. MAC-VRF Instances for Multitenancy in Network Virtualization Overlays: Cisco supports this, enabling Layer 2 separation between tenants in an overlay network.

1. IBGP for Overlays

Internal BGP (IBGP) is a routing protocol used to exchange reachability information across an IP network. Combined with Multiprotocol BGP (MP-IBGP), it can form the basis for EVPN to exchange reachability information between VTEP devices, a critical function for establishing inter-VTEP VXLAN tunnels and utilizing them for overlay connectivity services.

In this architecture, spine and leaf devices use their loopback addresses to establish peering within a single autonomous system. Spine devices function as a route reflector cluster, and leaf devices operate as route reflector clients. While route reflectors minimize the number of necessary peering connections, they also represent potential single points of failure. Redundancy and backup route reflectors should be considered for a resilient network design.

The route reflector cluster is located at the spine layer in the referenced design. In a Cisco environment, devices with high processing power, such as the Cisco Nexus 9000 series switches, would be well-suited to act as spines and handle the additional traffic from route reflector clients in the network virtualization overlay.

Utilizing this architecture with Cisco equipment can provide a more scalable, efficient network design by reducing the number of direct BGP peerings needed, saving resources, and simplifying network management.

2. Bridged Overlay

In a Bridged Overlay model, Ethernet VLANs are extended between leaf devices through VXLAN tunnels. This design is suitable for data center networks requiring Ethernet connectivity between leaf devices but not routing between the VLANs. Consequently, the spine devices mainly provide essential underlay and overlay connectivity for the leaf devices without performing routing or gateway services typically seen with other overlay methods.

Leaf devices create VTEPs to connect to other leaf devices. These tunnels enable the leaf devices to send VLAN traffic to other leaf devices and Ethernet-connected end systems in the data center. The simplicity of this overlay service makes it appealing for operators seeking an uncomplicated approach to integrating EVPN/VXLAN with their existing Ethernet-based data center.

However, it’s important to note that routing can be added to a bridged overlay if required. This can be done by implementing an external routing device like a Cisco ISR or ASR router to the EVPN/VXLAN fabric. Alternatively, you could opt for another overlay type incorporating routing, such as an edge-routed bridging overlay, a centrally-routed bridging overlay, or a routed overlay.

This bridged overlay model can be implemented using Cisco’s Nexus series switches as leaf devices. Nexus switches support VXLAN and can create VTEPs for establishing the VXLAN tunnels required in this model. This can provide a simple and effective way to virtualize the network, extending VLANs across the network and providing high levels of scalability.

3. Centrally Routed Bridging Overlay

In a Centrally Routed Bridging Overlay, routing occurs at a central gateway of the data center network (for instance, the spine layer), rather than at the VTEP device where the end systems are connected (such as the leaf layer). This overlay model can be used when you need routed traffic to go through a centralized gateway or when your edge VTEP devices lack the required routing capabilities.

Traffic originating from Ethernet-connected end systems is forwarded either to local end systems or to end systems connected to remote VTEP devices. An Integrated Routing and Bridging (IRB) interface at each spine device helps route traffic between the Ethernet virtual networks.

This overlay service model enables easy aggregation of a collection of VLANs into the same overlay virtual network. This design, when implemented with Cisco’s solutions, supports a variety of VLAN-aware Ethernet service model configurations in the data center:

  • Default instance VLAN-aware
  • Virtual switch VLAN-aware
  • MAC-VRF instance VLAN-aware

All these configurations enable a high degree of flexibility and scalability, allowing for creating of virtual networks that suit a wide range of needs while maintaining high levels of performance and reliability.

4. Edge-Routed Bridging Overlay

In the Edge-Routed Bridging Overlay model, the Integrated Routing and Bridging (IRB) interfaces are moved to leaf device VTEPs at the edge of the overlay network to bring IP routing closer to the end systems. This model is only possible on specific switches due to the unique ASIC capabilities required to support bridging, routing, and EVPN/VXLAN in one device. For instance, some of Cisco’s Nexus series models could serve as leaf devices in an edge-routed bridging overlay.

In this model, the spine devices are solely configured to manage IP traffic, eliminating the need to extend the bridging overlays to the spine devices. This setup also enables faster server-to-server, intra-data center traffic (east-west traffic) where the end systems are connected to the same leaf device VTEP. As a result, routing occurs much closer to the end systems than with centrally routed bridging overlays.

5. Collapsed Spine Overlay

The Collapsed Spine Overlay design eliminates the need for leaf layer devices, allowing a data center operator to implement a simplified two-layer topology with spine devices acting as both leaf and spine devices. Given the absence of a leaf layer, VTEPs and IRB interfaces are configured directly on the spine devices, simplifying the overall network design and operation.

However, it’s important to note that this overlay model is unsuitable for all data center operators or network designs due to its limitations in scale and network design flexibility.

6. IRB Addressing Models in Bridging Overlays

Two distinct models address IRB interfaces in bridging overlays – unique-per-EVI and non-unique-per-EVIUnique-per-EVI assigns a unique IP address to each IRB interface per Ethernet VPN instance (EVI), making it suitable for multi-tenancy environments where each tenant requires a separate IP address space. Conversely, the non-unique-per-EVI model allows the same IP address to be used across multiple EVIs, making it ideal for more straightforward scenarios where unique IP addresses for each tenant are unnecessary. Bear in mind the selection of an IRB addressing model will be contingent on your network’s requirements and constraints.

7. Routed Overlay using EVPN Type 5 Routes

A routed overlay enables Layer 3 connectivity (IP Routing) between different Layer 2 domains (VXLANs) in the data center. This is achieved using EVPN Type 5 routes, providing a scalable and reliable way to extend IP networks across many devices in the data center network.

This configuration aligns with the needs of modern cloud-based applications and services, which predominantly utilize IP for communication. It does not extend VLANs across the data center network and does not require VLAN or bridge-domain configuration on the spine devices.

8. MAC-VRF Instances for Multitenancy in Network Virtualization Overlays

In the context of Cisco NX-OS, analogous concepts are present, albeit with differing terminology. For instance, the term “VRF” (Virtual Routing and Forwarding) in Cisco’s world is somewhat equivalent to the term “MAC-VRF” used in the EVPN/VXLAN realm. Each VRF in Cisco NX-OS can be considered a separate logical routing and forwarding domain, similar to a MAC-VRF instance in EVPN/VXLAN terminology.

This creates a certain level of multi-tenancy, where multiple virtual networks can exist on the same physical infrastructure but operate independently. This is crucial in data center networks where numerous customers or applications may need to be separated for security or operational reasons.


More to come…..