D-Link DFL-800-AV-12 User Manual - Page 291
High Availability Mechanisms
View all D-Link DFL-800-AV-12 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 291 highlights
11.2. High Availability Mechanisms Chapter 11. High Availability 11.2. High Availability Mechanisms D-Link HA provides a redundant, state-synchronized hardware configuration. The state of the active unit, such as the connection table and other vital information, is continuously copied to the inactive unit via the sync interface. When cluster failover occurs, the inactive unit knows which connections are active, and traffic can continue to flow. The inactive system detects that the active system is no longer operational when it no longer detects sufficient Cluster Heartbeats. Heartbeats are sent over the sync interface as well as all other interfaces. NetDefendOS sends 5 heartbeats per second from the active system and when three heartbeats are missed (that is to say, after 0.6 seconds) a failover will be initiated. By sending heartbeats over all interfaces, the inactive unit gets an overall view of the active unit's health. Even if sync is deliberately disconnected, failover may not result if the inactive unit receives enough heartbeats from other interfaces via a shared switch, however the sync interface sends twice as many heartbeats as any of the normal interfaces. The administrator can disable heartbeat sending on any of the interfaces. Heartbeats are not sent at smaller intervals because such delays may occur during normal operation. An operation such as opening a file, could result in delays long enough to cause the inactive system to go active, even though the other is still active. Cluster heartbeats have the following characteristics: • The source IP is the interface address of the sending firewall • The destination IP is the shared IP address • The IP TTL is always 255. If NetDefendOS receives a cluster heartbeat with any other TTL, it is assumed that the packet has traversed a router, and hence cannot be trusted. • It is a UDP packet, sent from port 999, to port 999. • The destination MAC address is the ethernet multicast address corresponding to the shared hardware address. In other words, 11-00-00-C1-4A-nn. Link-level multicasts are used over normal unicast packets for security: using unicast packets would mean that a local attacker could fool switches to route heartbeats somewhere else so the inactive system nevers receives them. The time for failover is typically about one second which means that clients may experience a failover as a slight burst of packet loss. In the case of TCP, the failover time is well within the range of normal retransmit timeouts so TCP will retransmit the lost packets within a very short space of time, and continue communication. UDP does not allow retransmission since it is inherently an unreliable protocol. Both master and slave know about the shared IP address. ARP queries for the shared IP address, or any other IP address published via the ARP configuration section or through Proxy ARP, are answered by the active system. The hardware address of the shared IP address and other published addresses are not related to the actual hardware addresses of the interfaces. Instead the MAC address is constructed by NetDefendOS from the Cluster ID in the following form: 10-00-00-C1-4A-nn, where nn comes from combining the Cluster ID configured in the Advanced Settings section with the hardware bus/slot/port of the interface. The Cluster ID must be unique for each cluster in a network. As the shared IP address always has the same hardware address, there will be no latency time in updating ARP caches of units attached to the same LAN as the cluster when failover occurs. When a cluster member discovers that its peer is not operational, it broadcasts gratuitous ARP queries on all interfaces using the shared hardware address as the sender address. This allows switches to re-learn within milliseconds where to send packets destined for the shared address. The only delay in failover therefore, is detecting that the active unit is down. ARP queries are also broadcast periodically to ensure that switches don't forget where to send 291