Replication Best Practices
This section discusses Solace's recommendations for configuring data center replication. Before you read these best practices, ensure that you also understand the replication concepts described in Replication Overview. In addition, you should also understand how the use of transactions and replication modes affect the replication process. These concepts are discussed in detail in How Replication Works and Synchronous and Asynchronous Message Replication.
Do Not Mix Message Types
We recommend that you do not mix message types. In particular, mixing the following types of messages on a single endpoint can create complications:
- replicated and non-replicated messages
- transacted and non-transacted messages
If replicated and non-replicated messages are mixed, the endpoint will have different sets of messages on the active and standby sites. In the event of a replication failover to the standby site, clients will have different messages to consume in the endpoint which will likely be difficult for the client application to handle. Additionally, the non-replicated messages that are in the endpoint on the previously active, now standby site are not guaranteed to be preserved (since they were never intended to be replicated) and will eventually be cleaned from the endpoint on the standby event broker as newer replicated messages are consumed from that endpoint on the newly active site. This means that on a fail-back to the originally active site, this messages may or may not be present in the endpoint.
If an endpoint is being subject to a mixture of transacted and non-transacted operations, then delivery delays can occur, especially when using synchronous transactions and when the replication service becomes degraded. The issue is worse if the
reject-msg-when-sync-ineligible option is enabled.
Use a Replication-Specific Topic Hierarchy
To prevent the complexities that occur when you mix message types on an endpoint, we recommend that you use a topic structure that classifies messages by type. We recommend a topic hierarchy that classifies messages as follows:
- Guaranteed messages that will:
- Not be replicated
- Be synchronously replicated
- Be asynchronously replicated
- Guaranteed messages that have a Store and Forward forwarding mode
- Direct messages
- Messages from a particular paired replication site (for example, the paired New York and New Jersey replication sites)
- Messages from different publishers
For example, a topic hierarchy that accomplishes all of these things would have the following topic prefixes:
For deployments where short topics are preferred, the topics could be made less verbose. For example,
Creating such a hierarchy provides the following benefits:
- It simplifies the configuration of replicated topic subscriptions—only two subscriptions need to be added per replicated Message VPN to replicate all messages:
solace(configure/message-vpn/replication)# create replicated-topic */*/MODE_GM_SF/REPL_ASYNC/>
solace(configure/message-vpn/replication)# create replicated-topic */*/MODE_GM_SF/REPL_SYNC/>
solace(...sage-vpn/replication/replicated-topic)# replication-mode sync
- It prevents the unintended promotion or demotion of a message because of topic matches (for example, a non-persistent message that is converted to a Direct message). If Direct message consumers only subscribe to
*/*/MODE_DIRECTtopics, and Guaranteed message consumers only subscribe to
*/*/MODE_GM*promotion and demotion is avoided.
- The paired replication site prefix allows for creation of bridged networks without forwarding loops.
- It is easy for publishers to make use of last value queues (LVQs) to determine their last published message by setting an LVQ’s subscription to
The last three points offer benefits to all Solace-based solutions, not just those using replication. For non-replicated solutions, the paired replication site prefix would become a prefix specific to the virtual router.
For applications that publish directly to queues rather than publishing to topics that are mapped to queues, the published messages can be replicated by configuring the queue's unique topic (
Ensure Sufficient Network Bandwidth
When you deploy replication, there must be sufficient network bandwidth to accommodate the published message rate for all replicated topics. Some additional overhead is needed if you are using transactions. The replication queue can absorb message bursts above the available bandwidth, but it is important that the network connection between the active and standby site link be fast to keep up with the replication data. Compression can be enabled on the replication bridge connection, if necessary.
If security is an issue between the replication sites, SSL encryption on the replication bridge can be enabled.
Be Aware of System Resources Used by Replication
The replication facility consumes some system resources when it's enabled on an event broker because the event broker automatically creates the following objects:
- one Message VPN bridge for the replication facility, plus one Message VPN bridge for each replicated Message VPN
- one queue for each Replicated Message VPN
- one queue topic subscription for each Replicated Message VPN
These system-created objects all have names that begin with the
# character (for example, the replication bridge is
#MSGVPN_REPLICATION_BRIDGE). As objects required for the successful operation of the replication facility, you can't delete or directly edit them.
When the Config-Sync facility is enabled, an event broker also automatically creates some objects that consume system resources. For information, refer to System Resources Used by Config-Sync.
If local or XA transactions are being replicated, additional transaction resources are used on both the active and standby sites to replicate the transactions.
Ensure Adequate Resources at Both Sites
When using replication, you must ensure that the Message VPN-level and system-level resources used by one event broker does not exceed those that can be supported by the event broker in the other replication site. Consider, for example, a scenario where the event broker used at the primary site has higher client connection capacity than the event broker at the backup site. In the event of a failover, all the clients may not be able to connect to the backup site.