Failing Back to a Restored Site After an Uncontrolled Failover
After the message VPN replication has become synchronous eligible, you can fail back to the restored site. This may be desirable if the primary site has higher capacity or capabilities than the backup site. However, failing back immediately is not recommended. We recommend waiting to fail back until all the messages that were replicated before the failover have been consumed.
The procedure for failing back is the same as for a planned outage (see Performing a Controlled Failover). However, the following considerations apply, especially if you are considering failing back shortly after having switched activity:
- Transactions that were in progress when the failover occurred are at risk of loss or duplication.
- If the message-spool of the restored site could not be recovered, messages replicated before the failure that have not been consumed on the active are lost.
If there is no hardware failure of the ADB or loss of data on the external disk, then the pre-failure state of the message spool can be recovered. Examples of failure that allow for the recovery of the state include:
- network isolation
- power failure
- temporary loss of connectivity to external disk
If the pre-failure message-spool can be recovered and the replication queue on the now active site has not filled, then messages that were replicated before the failover are available and full replication behavior is restored. In a long-term failure where the replication queue fills, then only messages and transactions that made it into the replication queue will be available on a fail back.
If there is a hardware failure or loss of data on the external disk, then the pre-failure message spool cannot be recovered and will be empty.
Transactions that were in progress at the time of the uncontrolled failover will not be automatically synchronized after restoring the standby site. The risks of loss and duplication for synchronous transactions can be eliminated, assuming replication was never disabled, if you repeat the following procedure for every endpoint before failing back to the formerly active site.
- Step 1: Verify if the Endpoint is Configured to Propagate Consumer Acknowledgments to the Replication Standby Site
- Step 2: Display the Internal IDs of Messages on the Formerly Active Site
- Step 3: Display the Internal IDs of Messages on the Active Site
- Step 4: Delete Messages on the Formerly Active Site
Step 1: Verify if the Endpoint is Configured to Propagate Consumer Acknowledgments to the Replication Standby Site
For example:
NY_EventBroker1# show queue myQueue message-vpn Trading_VPN detail ... Consumer Ack Propagation : Yes
Move on to the next endpoint if Consumer Ack Propagation is not enabled.
Deduplication applies only for endpoints with Consumer Ack Propagation enabled.
Step 2: Display the Internal IDs of Messages on the Formerly Active Site
For example:
NY_EventBroker1# show queue myQueue message-vpn Trading_VPN messages detail
Name: myQueue
Message Id: 2852
Date spooled: Aug 22 2016 02:12:29 UTC
Replicated: yes
Replicated Mate Message Id: n/a
Message Id: 2853
Date spooled: Aug 22 2016 02:12:29 UTC
Replicated: yes
Replicated Mate Message Id: n/a
Message Id: 2901
Date spooled: Aug 22 2016 02:12:39 UTC
Replicated: yes
Replicated Mate Message Id: n/a
Message Id: 2903
Date spooled: Aug 22 2016 02:12:39 UTC
Replicated: yes
Replicated Mate Message Id: n/a
Message Id: 2944
Date spooled: Aug 22 2016 02:15:30 UTC
Replicated: yes
Replicated Mate Message Id: 2919
Message Id: 2945
Date spooled: Aug 22 2016 02:15:30 UTC
Replicated: yes
Replicated Mate Message Id: 2921
...
Step 3: Display the Internal IDs of Messages on the Active Site
For example:
BOS_EventBroker# show queue myQueue message-vpn Trading_VPN messages oldest Name: myQueue Message Id Replicated Mate Message Id Sent 2840 2852 yes 2841 2853 yes 2919 n/a no 2921 n/a no
Step 4: Delete Messages on the Formerly Active Site
Delete messages on the formerly active site (NY_EventBroker1 in the example above) that match ALL of the following conditions:
- The message must have a
Replicatedflag set toyes. - The message must have a
Replicated Mate Message Idofn/a. - The message's Id must not match any
Replicated Mate Message Idon the active site (BOS_EventBrokerin the example above).
In the example above, messages 2901 and 2903 can be deleted.
Messages 2852 and 2853 cannot be deleted because their Ids exist in the Replicated Mate Message Id field of BOS_EventBroker.
Messages 2944 and 2945 cannot be deleted because the Replicated Mate Message Id field is not n/a.
Enter the following ADMIN command to delete the messages:
solace(admin/message-spool)# delete-messages queue <queue-name> message <msg-id>
For example, to delete message 2901 and 2903 in the example above:
NY_EventBroker1# admin NY_EventBroker1(admin)# message-spool message-vpn Trading_VPN NY_EventBroker1(admin/message-spool)# delete-messages queue myQueue message 2901 NY_EventBroker1(admin/message-spool)# delete-messages queue myQueue message 2903