Data Center Replication for Disaster Recovery
You can put in place a disaster recovery (DR) solution for Solace PubSub+ event brokers using data center replication. Replication provides business continuity and allows mission-critical applications to continue to function during a major service outage to a data center.
To implement replication, Config-Sync must be enabled for each event broker in a replicated site. Config‑Sync provides automatic synchronization of Message VPN configuration parameters that must match between replicated event brokers. For more information, see Config-Sync. For redundant appliances that are handling Guaranteed Messaging, durable endpoint information such as queue and topic endpoints, topic-to-queue mappings, and queue options are automatically propagated whether Config-Sync is enabled or not.
When replication is enabled, guaranteed messages that are published to a Message VPN with an active
replication state at one data center are automatically propagated to matching Message VPNs with a standby
replication state at another data center. The replication data center is typically located in a separate geographic location. In addition, if the messages are part of a local or XA transaction, the transaction is propagated to the standby site and the transaction semantics are respected. For example, rolling back a transaction would roll it back on both sites. Preparing an XA transaction would prepare the transaction on both sites. In a scenario where a major service outage occurs for one replicated data center (that is, one replication site), a service failover to the operational mate replication site can be performed.
A typical customer deployment model for replicated data center infrastructure is to have a pair of replication sites located some distance apart (perhaps 50 or 100 miles). These sites are considered replication mates, and known collectively as a replication group. The main or primary site will use a high-availability (HA) pair of event brokers to protect against a local failure of an event broker or equipment in that site. The secondary or standby site may have a single event broker or an HA pair of event brokers. The primary site provides service unless there is a failure of the primary site. If the primary site fails, service is failed over to the backup site. Once the primary site is restored, service can be failed back to the primary site. This model is illustrated in the following diagram:
The failover of a replication site is often an action that cannot be performed at the messaging layer only—typically there are servers, critical applications, and other infrastructure that must be switched as part of the failover. Therefore the failover is a co-ordinated operation that must be performed by network administrators. It does not happen automatically.
Replication is not a replacement for HA event broker redundancy within a data center. Event broker redundancy provides automatic protection against a single event broker failure. Replication protects against more catastrophic events in the data center and requires manual intervention to effect a failover.
Message replay is not supported with replication. Messages written to the replay log on the active site may not be written to the replay log at the standby site.
When configuring a bi-directional Message VPN bridge in a Message VPN where replication is also enabled, avoid subscribing both ends of that bridge to the same topics if those topics are also configured for replication. This restriction also applies to overlapping wildcard subscriptions. In other words, it applies to any subscription which would match a message received over the bridge. If such topics exist, then following a replication failover, they can cause messages originally received over the bridge to be sent back over that bridge to the originating event broker. This results in message duplication in the originating broker.
For more details about using replication, see:
- How Replication Works
- Replication Best Practices
- Selecting Which Messages to Replicate
- Synchronous and Asynchronous Message Replication
- Replication Queue Full
- Switching Service Between Sites
- Deployment Options for Replication
- Using Replication with DMR
If you are using replication with PubSub+ Cloud, see Using Replication for Disaster Recovery of Event Broker Services.
For instructions for setting up replication, see Configuring Replication.
To learn how to check the status of replication, see Monitoring Replication.
For details about performing failovers and recovering from failovers, see Procedures for Switching Replication Service Between Sites.