Monday, April 23, 2018

Building Low Latency Data Center Switching Fabric

What is Switching Fabric?
A data center fabric is a system of switches and servers and the interconnections between them that can be represented as a fabric. Because of the tightly woven connections between nodes (all devices in a fabric are referred to as nodes), data center fabrics are often perceived as complex, but actually it is the very tightness of the weave that makes the technology inherently elegant. A data center fabric allows for a flattened architecture in which any server node can connect to any other server node, and any switch node can connect to any server node. This flattened architecture of fabrics is key to their agility.

What are the trends of Switching Fabric?
In earlier days, Data Center Architecture was of 3 Tier architecture running spanning tree or layer 3 routing across the switches. The biggest problem was with these architecture was that only single path is selected and rest of the bandwidth got wasted across the network. All data traffic takes that best path as per the routing table until the point that it gets congested then packets are dropped. This fabric was not enough to handle the existing traffic data growths, with predictability and shift was required.

Clos networks made existing complex topology made simple by giving name SPINE and LEAF in modern data center switching topologies. Data center networks are comprised of top-of-rack switches and core switches. The top of rack (ToR) switches are the leaf switches and they are attached to the core switches which represent the spine. The leaf switches are not connected to each other and spine switches only connect to the leaf switches (or an upstream core device). In this Spine-Leaf architecture, the number of uplinks from the leaf switch equals the number of spine switches. Similarly, the number of downlinks from the spike equal the number of leaf switches. The total number of connections is the number of leaf switches multiplied by the number of spine switches. If you have 4 spines and 8 leafs in that case you need to have 4 x 8 = 32 Connections.

How Latency Is Getting Improved By Changing Data Center Switching?
All of us aware that layer 2 switches are usually responsible for transporting data on the data link layer and performs error checking on each transmitted and received frame. The old generation or we can say the earlier used switches in the data center perform store and forwarding switching. In store and forwarding switching, the entire switch has to be received first and after that it is being forwarded. The switch stores the entire frame and does the CRC calculations before it forwards. If no CRC errors are present in that case switch forwards the frame else drop it.

In case of CUT Through Switching, when the switch receive the frame it looks the first 6 bytes of the frame, then the switch checks the destination mac address , outgoing interface and forwards the frames. The all type of error calculations are done by the receiving device as contract to transmitting device in case of store and forward switching.

Improve Latency By Converting NIC to Smart NIC
Traditionally, TCP/IP protocol processing has been performed in software by the end system’s CPU. With the high packet load CPU also get busy in processing and unnecessarily increases the host latency. This is the latency which is being incurred by the host and not visible to anyone as no one cares about it. But with the help of Smart NIC we can offload the protocol and network process on NIC and is also known as Intelligent Server Adapter. This has been widely used in cloud data center servers to boost the performance by offloading CPU in NICs. Traditional NICs only support check sum and segmentation but if we have to offload the entire complex server based networking data plane which includes the SDN tunnel termination starting and ending point. The smart NIC has to be open and programmable; if it is not the case, it will become fixed and difficult to control and program by the SDN controller. Initially the packets and flows are handled by the host but as soon as the flow get detected it will be offload to Smart NIC. Typically, a SmartNIC includes larger memory on-chip or on the SmartNIC board to hold a much larger number of flows.

Summary
From the above comparison, we can conclude that in case of Building Low Latency Datacenters, switching latency is one of the considered parameters but in today’s world switch latency difference between store-and forward and cut-through switching is negligible.

Clos Latency can’t be negligible because it give the predictability and help to utilize all the available paths as compared to three tier architecture. Apart from these network latency has to be considered.

Intelligent Ethernet NICs offload protocol processing from the application CPU thereby eliminating software performance bottlenecks, minimizing CPU utilization, and greatly reducing the host component of end-to-end latency.

People who read this post also read :



No comments: