Configuring Data Center Replication

The Solace replication facility provides a data center redundancy and disaster recovery solution for the Solace Message Platform.

The replication facility uses corresponding Message VPNs with active and standby replication states at separate replication sites to ensure that Guaranteed Messaging clients can continue to have service through a specified Message VPN should one data center become unavailable. When replication is implemented, Guaranteed messages received by durable endpoints in a Message VPN with an active replication state at one replication site are automatically propagated to corresponding durable endpoints in a duplicate Message VPN with a standby replication state at the other replication site. In addition, local and XA transactions that publish or consume replicated Guaranteed messages are automatically propagated to the standby replication site. If a service fail-over to one replication site occurs, clients can reconnect to the same Message VPN at a different replication site to continue to receive service, and any messages that were received, but not consumed, before the service interruption can be delivered to them.

Implementation Considerations

This section discusses design considerations that should be made prior to implementing data center replication.

Mixing of Message Types

The mixing of message types on an endpoint is not recommended. In particular, the mixing of the following types of messages on a single endpoint can create complications:

  • replicated and non-replicated messages
  • transacted and non-transacted messages

If replicated and non-replicated messages are mixed, the endpoint will have different sets of messages on the active and standby sites. In the event of a replication fail-over to the standby site, clients will have different messages to consume in the endpoint which will likely be difficult for the client application to handle. Additionally, the non-replicated messages that are in the endpoint on the previously active, now standby site are not guaranteed to be preserved (since they were never intended to be replicated) and will eventually be cleaned from the endpoint on the standby router as newer replicated messages are consumed from that endpoint on the newly active site. This means that on a fail-back to the originally active site, this messages may or may not be present in the endpoint.

If an endpoint is being subject to a mixture of transacted and non-transacted operations, then delivery delays can occur, especially when using synchronous transactions and when the replication service becomes degraded. The issue is worse if the reject-msg-when-sync-ineligible option is enabled.

Topic Prefixes and Hierarchy

In order to prevent complexities that occur when mixing message types on endpoints, it is recommended that a topic structure classifies messages by type be used. The following is a full message classification that provides many benefits:

  • Guaranteed messages that will
    • not be replicated
    • be synchronously replicated
    • be asynchronously replicated
  • Guaranteed messages that have a
    • Store and Forward forwarding mode
    • Cut Through forwarding mode
  • Direct messages
  • messages from a particular paired replication site (for example, the paired New York and New Jersey replication sites)
  • messages from different publishers

An example topic hierarchy that would accomplish all of these things would be to have the following topic prefixes:

<Rep_site_pair>/<pubId>/MODE_DIRECT/<app-topic>

<Rep_site_pair>/<pubId>/MODE_GM_CTP/REPL_NONE/<app-topic>

<Rep_site_pair>/<pubId>/MODE_GM_CTP/REPL_AYNC/<app-topic>

<Rep_site_pair>/<pubId>/MODE_GM_CTP/REPL_SYNC/<app-topic>

<Rep_site_pair>/<pubId>/MODE_GM_SF/REPL_NONE/<app-topic>

<Rep_site_pair>/<pubId>/MODE_GM_SF/REPL_AYNC/<app-topic>

<Rep_site_pair>/<pubId>/MODE_GM_SF/REPL_SYNC/<app-topic>

For deployments where short topics are preferred, the topics could be made less verbose. For example, <Rep_site_pair>/<pubId>/GM_SF/S/<app-topic>.

Creating such a hierarchy provides the following benefits:

  1. It simplifies the configuration of replicated topic subscriptions—only two subscriptions need to be added per replicated Message VPN to replicate all messages:
  2. solace(configure/message-vpn/replication)# create replicated-topic */*/MODE_GM*/REPL_ASYNC/>

    solace(...sage-vpn/replication/replicated-topic)# exit

    solace(configure/message-vpn/replication)# create replication-topic */*/MODE_GM*/REPL_SYNC/>

    solace(...sage-vpn/replication/replicated-topic)# replication-mode sync

    solace(...sage-vpn/replication/replicated-topic)# exit

  3. It prevents the unintended promotion or demotion of a message because of topic matches (for example, a non-persistent message that is converted to a Direct message). If Direct message consumers only subscribe to */*/MODE_DIRECT topics, and Guaranteed message consumers only subscribe to */*/MODE_GM* promotion and demotion is avoided.
  4. The paired replication site prefix allows for creation of bridged networks without forwarding loops.
  5. It prevents messages from being discarded due to incompatible forwarding modes. Only Cut Through Persistence-eligible publishers would publish to */MODE_GM/DELIVERY_CTP topics. Clients publishing to these topics would not use session-based transactions. This will prevent discards from occurring due to incompatible forwarding modes.
  6. It is easy for publishers to make use of last value queues (LVQs) to determine their last published message by setting an LVQ’s subscription to <Rep_site_pair>/<pubId>/>.

The last four points offer benefits to all Solace-based solutions, not just those using replication. For non-replicated solutions, the paired replication site prefix would become a prefix specific to the virtual router.

Note:  For applications that publish directly to queues rather than publishing to topics that are mapped to queues, the published messages can be replicated by configuring the queue’s special topic that is unique to the queue. (The special topic for a queue is #P2P/QUE/<queue-name>.)

Network Considerations

When deploying replication, there must be sufficient network bandwidth to accommodate the published message rate for all replicated topics. Some additional overhead is needed if using transactions. The replication queue can absorb message bursts above the available bandwidth, but it is important that the network connection between the active and standby site link be fast to keep up with the replication data. Compression can be enabled on the replication bridge connection, if necessary.

If security is an issue between the replication sites, SSL encryption on the replication bridge can be enabled.

System Resources Used by Replication

The replication facility consumes some system resources when it is enabled on a Solace router because the router automatically creates the following objects:

  • one Message VPN bridge for the replication facility, plus one Message VPN bridge for each replicated Message VPN
  • one queue for each Replicated Message VPN
  • one queue topic subscription for each Replicated Message VPN

These system-created objects all have names that begin with the “#” character (for example, the replication bridge is #MSGVPN_REPLICATION_BRIDGE). As objects required for the successful operation of the replication facility, users cannot delete or directly edit them.

  • These system-created objects all have names that begin with the “#” character (for example, the replication bridge is #MSGVPN_REPLICATION_BRIDGE). As objects required for the successful operation of the replication facility, users cannot delete or directly edit them.
  • When the Config-Sync facility is enabled, a router also automatically creates some objects that consume system resources. For information, refer to System Resources Used by Config-Sync.

If local or XA transactions are being replicated, additional transaction resources are used on both the active and standby sites to replicate the transactions.

Avoid Overbooking Resources

When using replication, you must ensure that the Message VPN-level and system-level resources used by one router does not exceed those that can be supported by the router the other replication site. Consider, for example, a scenario where the appliance used at the primary site has higher client connection capacity than the appliance at the backup site. In the event of a fail-over, all the clients may not be able to connect to the backup site.