Most of the time, I always be a part of layer 2 routing loop discussion but never heard about the engineers talking about layer 3 routing loop problem. This is being experienced during my recent visit in some company where all engineers knows why to use Spanning Tree Protocol but none of them knew why to use split horizon and route poisoning.
In layer 3 networks, there might chance of routing loops and split horizon, hold down timers and route poisoning are the techniques which help to prevent the layer 3 routing loop. Below depicted Figure 1 is showing the converge network. Let’s assume in case of failure of 10.4.0.0 network, router C will forward the update to router B and router B will forward the update to router A and router C as well. By doing this the same kind of the update which is being generated by C is received and C might think that he is getting the information of 10.4.0.0 network from B but In fact, 10.4.0.0 network is directly attached to router C. This situation can arise in smaller networks too.
The first work around is the split horizon technique which says not to send the updates to the interfaces from it has been received. It looks like send update information (Number of interfaces – Receiving Interface Updates).
Next one is route poisoning, when the router detects link down, the attached router sends the update to its neighbors. But in this case, the receiving router can send back the received information to the same interface from where it received by setting the route metric to maximum. Definitely this is the violation of split horizon rule but it helps router to understand about that particular network is down or inaccessible which actually help the convergence of routing. Now 10.4.0.0 is poisoned route which is having the maximum metric assigned as the route is not reachable. When the neighbor send the route back to the originator, it becomes reverse poisoned.
What does route poisoning do?
1. Set the hop count to an unreachable state as soon as the failed network is detected
2. Route remains poisoned until the hold-down timer expires.
3. Hold timer depends on the routing protocol; Every protocol is having different hold-down timer.
4. Only uni direction traffic flow.
5. If the route is not back up during the hold down time period expires, that route is removed from the routing table and added in the garbage table.
The last one is Hold Down timers. What does hold-timers do?
1. A router receives an update from a neighbor indicating that a network that previously was accessible is now no longer accessible.
2. The receiving router marks that route possibly down and starts the hold-down timer.
3. If an update with a better metric for that network is received from any neighboring router during the hold-down period, the network is reinstated and the hold-down timer is removed.
4. If an update from any other neighbor is received during the hold-down period with the same or worse metric for that network, that update is ignored. Thus, more time is allowed for the information about the change to be propagated.
5. Routers still forward packets to destination networks that are marked as possibly down. This allows the router to overcome any issues associated with intermittent connectivity. If the destination network truly is unavailable and the packets are forwarded, black hole routing is created and lasts until the hold-down timer expires. (Very Important Point). This could be the reason, administrators look forward to reduce the hold-down timers to increase the convergence time. Definitely if the network is not stable these timers generates lot of messages.
As per section 2.2.2, RFC 1058 explicitly says that “Split horizon with poisoned reverse will prevent any routing loops that involve only two gateways. However, it is still possible to end up with patterns in which three gateways are engaged in mutual deception.” Definitely this could be the case of broadcast of multi-access networks.