Failing Back to a Restored Site After an Uncontrolled Failover
After the message VPN replication has become synchronous eligible, you can fail back to the restored site. This may be desirable if the primary site has higher capacity or capabilities than the backup site. However, failing back immediately is not recommended. We recommend waiting to fail back until all the messages that were replicated before the failover have been consumed.
The procedure for failing back is the same as for a planned outage (see Performing a Controlled Failover). However, the following considerations apply, especially if you are considering failing back shortly after having switched activity:
- Transactions that were in progress when the failover occurred are at risk of loss or duplication.
- If the message-spool of the restored site could not be recovered, messages replicated before the failure that have not been consumed on the active are lost.
If there is no hardware failure of the ADB or loss of data on the external disk, then the pre-failure state of the message spool can be recovered. Examples of failure that allow for the recovery of the state include:
- network isolation
- power failure
- temporary loss of connectivity to external disk
If the pre-failure message-spool can be recovered and the replication queue on the now active site has not filled, then messages that were replicated before the failover are available and full replication behavior is restored. In a long-term failure where the replication queue fills, then only messages and transactions that made it into the replication queue will be available on a fail back.
If there is a hardware failure or loss of data on the external disk, then the pre-failure message spool cannot be recovered and will be empty.
Transactions that were in progress at the time of the uncontrolled failover will not be automatically synchronized after restoring the standby site. The risks of loss and duplication for synchronous transactions can be eliminated, assuming replication was never disabled, if you repeat the following procedure for every endpoint before failing back to the formerly active site.
- Step 1: Verify if the Endpoint is Configured to Propagate Consumer Acknowledgments to the Replication Standby Site
- Step 2: Display the Internal IDs of Messages on the Formerly Active Site
- Step 3: Display the Internal IDs of Messages on the Active Site
- Step 4: Delete Messages on the Formerly Active Site
Step 1: Verify if the Endpoint is Configured to Propagate Consumer Acknowledgments to the Replication Standby Site
For example:
NY_EventBroker1# show queue myQueue message-vpn Trading_VPN messages oldest ... Consumer Ack Propagation : Yes
Move on to the next endpoint if Consumer Ack Propagation
is not enabled.
Deduplication applies only for endpoints with Consumer Ack Propagation
enabled.
Step 2: Display the Internal IDs of Messages on the Formerly Active Site
For example:
NY_EventBroker1# show queue myQueue message-vpn Trading_VPN messages oldest Name: myQueue Message Id: 2852 Date spooled: Aug 22 2016 02:12:29 UTC Replicated: yes Replicated Mate Message Id: n/a Message Id: 2853 Date spooled: Aug 22 2016 02:12:29 UTC Replicated: yes Replicated Mate Message Id: n/a Message Id: 2901 Date spooled: Aug 22 2016 02:12:39 UTC Replicated: yes Replicated Mate Message Id: n/a Message Id: 2903 Date spooled: Aug 22 2016 02:12:39 UTC Replicated: yes Replicated Mate Message Id: n/a Message Id: 2944 Date spooled: Aug 22 2016 02:15:30 UTC Replicated: yes Replicated Mate Message Id: 2919 Message Id: 2945 Date spooled: Aug 22 2016 02:15:30 UTC Replicated: yes Replicated Mate Message Id: 2921 ...
Step 3: Display the Internal IDs of Messages on the Active Site
For example:
BOS_EventBroker# show queue myQueue message-vpn Trading_VPN messages oldest Name: myQueue Message Id Replicated Mate Message Id Sent 2840 2852 yes 2841 2853 yes 2919 n/a no 2921 n/a no
Step 4: Delete Messages on the Formerly Active Site
Delete messages on the formerly active site (NY_EventBroker1
in the example above) that match ALL of the following conditions:
- The message must have a
Replicated
flag set toyes
. - The message must have a
Replicated Mate Message Id
ofn/a
. - The message's Id must not match any
Replicated Mate Message Id
on the active site (BOS_EventBroker
in the example above).
In the example above, messages 2901
and 2903
can be deleted.
Messages 2852
and 2853
cannot be deleted because their Ids exist in the Replicated Mate Message Id
field of BOS_EventBroker
.
Messages 2944
and 2945
cannot be deleted because the Replicated Mate Message Id
field is not n/a
.
Enter the following ADMIN command to delete the messages:
solace(admin/message-spool)# delete-messages queue <queue-name> message <msg-id>
For example, to delete message 2901
and 2903
in the example above:
NY_EventBroker1# admin NY_EventBroker1(admin)# message-spool message-vpn Trading_VPN NY_EventBroker1(admin/message-spool)# delete-messages queue myQueue message 2901 NY_EventBroker1(admin/message-spool)# delete-messages queue myQueue message 2903