Monitor DCE health in a cluster

After setting up your DCE in a high availability (HA) cluster environment, it is recommended to run a health monitor to verify that DCE is working. If a DCE cluster node fails over, the embedded clients can reconnect to an alternate DCE node in the cluster and continue the user session. In order for this to happen the NLB must first detect that the DCE node is no longer in service. This can be done by using an NLB health monitor for the DCE service.

The DCE service supports a “DCEHealthCheck” URL for active TCP monitoring of the DCE service by the NLB to determine if DCE can respond to the request in a timely manner. The DCE HealthCheck monitor continually pings the DCE nodes on port 2939 and takes the node offline on failure. Once the NLB detects a node failure it stops routing new client connection requests to the failing DCE node. Clients with existing connections to the failing node may have to wait for a connection timeout. An alternative is to configure the NLB to reset existing client connections as soon as the failure is detected, causing the client to request a new connection without waiting for a network timeout.

The following basic configuration settings are required:

  • Interval: 15 seconds (how often the monitor sends a request to the DCE node)
  • Timeout: 15 seconds (how long the monitor waits for a successful response before taking the failed node offline). These Interval and Timeout values are similar to the DCEs internal timeouts.
  • Send String: GET /DCEHealthCheck HTTP/1.1\r\nConnection: close\r\n
  • Receive String: HTTP/1.1 200 OK