Monitoring Your DMR Cluster

If there are problems with your event mesh, for example if DMR is configured incorrectly or if there are operational problems in the network, one or more brokers may report topology errors or DMR being operationally down.

Topology Errors and Troubleshooting

Topology errors are caused by problems in the configured or operational network topology.

Topology problems are expected while adding and removing nodes in a network, because configuration changes need time to propagate through the event mesh. However, if you following the procedures in this documentation, no message loss is expected, and the topology problem should be temporary and clear by itself.

Some examples of topology errors are:

  • missing links—every node in a cluster must be connected by an internal link. Links must be bi-directional; that is, each link must be configured correctly on both nodes.
  • mismatched link span—a link is either internal (connecting two nodes within a cluster) or external (connecting gateway nodes in two different clusters). The configuration at both ends of the link must match.
  • missing channels—each link must have a control channel plus one data channel per message VPN.
  • node name mismatch—each node must correctly specify the name of the remote node connected by the link.
  • DMR is not correctly enabled—each participating VPN must opt in to DMR. In addition, for nodes connected by an internal link, the list of Message VPNs participating in DMR on each node must be the same.

DMR can be operationally down for other reasons, such as:

  • misconfiguration in other areas, such as replication or redundancy
  • problems with Guaranteed Messaging (e.g., the message spool is down)
  • issues with cluster authentication
  • issues with cluster synchronization

If there is a topology error, or DMR is down for some other reason, subscription propagation and data forwarding throughout the network may not be working properly.

To troubleshoot issues with your event mesh, use the commands in the sections that follow to view the configuration and operational status of the objects in the mesh. Start by looking at the cluster information, then the links, then the channels. That is, start at the widest level of detail (the cluster) and narrow down your focus (first link, then channel) as needed.

Displaying Cluster Information

To display cluster information, enter the following command:

show cluster <cluster-name-pattern> [detail]

Where:

<cluster-name-pattern> displays clusters matching the pattern. The cluster pattern can be the cluster's full name, or part of its name with the wildcard character ? used to represent one character of the name, or the wildcard character * used to represent zero or more characters of the name. Entering only the wildcard character * for the name displays all clusters.

detail displays detailed information for the specified cluster.

Displaying Cluster Synchronization Information

When you enter the show cluster command, in addition to other information about the cluster, the event broker also displays whether cluster synchronization (cluster-sync) is complete or not.

Cluster synchronization is the mechanism by which event brokers, which have just restarted and have no knowledge of remote event broker subscription needs, learn of those subscriptions before providing local service.

In cases where cluster synchronization cannot be achieved, service (on a per-Message VPN basis) is blocked until the issue causing the local event broker to be unable to learn of its remote subscriptions is corrected. There is no timeout. Being unable to achieve cluster synchronization in a timely manner is generally an operational or configuration issue with another event broker in the network. For more information, see Troubleshooting Cluster Synchronization Issues.

Troubleshooting Cluster Synchronization Issues

This section helps you diagnose and resolve issues when your event broker fails to achieve cluster synchronization after a restart. Cluster synchronization problems typically manifest as event brokers that remain in a synchronizing state longer than expected.

When troubleshooting cluster synchronization issues, consider these key points:

  • If an event broker is unable to achieve cluster synchronization after restarting, it is likely related to its inability to communicate with another event broker.

  • If DMR was not operationally Up without topology errors before the restart, it is likely to get blocked on cluster synchronization after the restart.

  • Cluster synchronization should never take a long time. It should complete in a timely manner, or it will never complete without some form of interaction from you.

  • There are likely multiple sources for subscription state for a given router name, as all event brokers in a cluster learn of the subscription sets of all other event brokers.

    • This means that if event broker A needs the subscriptions of event broker B, and B is down, it may still get the subscriptions from a third event broker C.

    • An event broker's HA mate is another independent source of subscriptions. This is particularly relevant to the inactive event broker in an HA pair, which can only get its subscription state from its active HA mate.

  • Cluster synchronization applies to an event broker's own router name. In other words, if an event broker restarts it should be able to learn its own past subscription set from its neighbors or HA mate.

Troubleshooting Steps

If your event broker remains in a synchronizing state longer than expected, perform the following steps to troubleshoot the issue:

  1. Verify network connectivity between all event brokers in the cluster.

  2. Check the DMR configuration. There may be a misconfiguration on either the local or the unreachable event broker (for example, a one-way cluster link may be configured).

  3. Verify that any unreachable event brokers exist in the network.

  4. Ensure all event brokers in the cluster are operational.

  5. Verify that all Message VPNs are properly configured on both the local and unreachable event broker (for example, DMR may be enabled on only one of the two event brokers).

  6. On external cluster links, check for mismatched DMR bridge configuration settings for a particular Message VPN (configuration may be missing, or an incorrect remote Message VPN may be configured).

  7. Check the cluster link status on neighboring event brokers.

  8. Verify that all gateway event brokers are correctly configured across the cluster.

  9. Ensure all required links are established.

When to Contact Support

If after following these troubleshooting steps your event broker remains unable to complete cluster synchronization:

  1. Run the gather-diagnostics command to collect system information.

  2. Note any error messages or unusual behavior.

  3. Document the steps you've already taken to resolve the issue.

  4. Contact Solace support with this information.

To display cluster link information, enter the following command:

show cluster <cluster-name-pattern> link <link-name-pattern> [detail | client-profile | queue | ssl]

Where:

<cluster-name-pattern> displays clusters matching the pattern.

<link-name-pattern> displays cluster links matching the pattern. The link pattern can be the link's full name, or part of its name with the wildcard character ? used to represent one character of the name, or the wildcard character * used to represent zero or more characters of the name. Entering only the wildcard character * for the name displays all cluster links.

detail displays detailed information for the specified link.

client-profile displays cluster link client profile information for the specified link.

queue displays cluster link queue information for the specified link.

ssl displays cluster link TLS/SSL information for the specified link.

Displaying Cluster Channel Information

To display cluster link channel information, enter the following command:

show cluster <cluster-pattern> link <link-pattern> channel message-vpn <vpn-name> [detail]

Where:

<cluster-name-pattern> displays clusters matching the pattern.

<link-name-pattern> displays cluster links matching the pattern.

<vpn-name> displays channel information for the specified Message VPN.

detail displays detailed channel information for the specified link.

Displaying Message VPN Information

To display DMR information for a particular Message VPN, enter the following command:

show message-vpn <vpn-name> dynamic-message-routing [dmr-bridge <remote-node-name-pattern>]

Where:

<vpn-name> displays DMR information for the specified Message VPN.

<remote-node-name-pattern> displays information for DMR bridges that connect to the specified remote node.