Failing Back to a Restored Site After an Uncontrolled Failover

After the message VPN replication has become synchronous eligible, you can fail back to the restored site. This may be desirable if the primary site has higher capacity or capabilities than the backup site. However, failing back immediately is not recommended. We recommend waiting to fail back until all the messages that were replicated before the failover have been consumed.

The procedure for failing back is the same as for a planned outage (see Performing a Controlled Failover). However, the following considerations apply, especially if you are considering failing back shortly after having switched activity:

  • Transactions that were in progress when the failover occurred are at risk of loss or duplication.
  • If the message-spool of the restored site could not be recovered, messages replicated before the failure that have not been consumed on the active are lost.

If there is no hardware failure of the ADB or loss of data on the external disk, then the pre-failure state of the message spool can be recovered. Examples of failure that allow for the recovery of the state include:

  • network isolation
  • power failure
  • temporary loss of connectivity to external disk

If the pre-failure message-spool can be recovered and the replication queue on the now active site has not filled, then messages that were replicated before the failover are available and full replication behavior is restored. In a long-term failure where the replication queue fills, then only messages and transactions that made it into the replication queue will be available on a fail back.

If there is a hardware failure or loss of data on the external disk, then the pre-failure message spool cannot be recovered and will be empty.

Transactions that were in progress at the time of the uncontrolled failover will not be automatically synchronized after restoring the standby site. The risks of loss and duplication for synchronous transactions can be eliminated, assuming replication was never disabled, if you repeat the following procedure for every endpoint before failing back to the formerly active site.

Step 1: Verify if the Endpoint is Configured to Propagate Consumer Acknowledgments to the Replication Standby Site

For example:

NY_EventBroker1# show queue myQueue message-vpn Trading_VPN messages oldest
...
Consumer Ack Propagation             : Yes

Move on to the next endpoint if Consumer Ack Propagation is not enabled. Deduplication applies only for endpoints with Consumer Ack Propagation enabled.

Step 2: Display the Internal IDs of Messages on the Formerly Active Site

For example:

NY_EventBroker1# show queue myQueue message-vpn Trading_VPN messages oldest

Name: myQueue

Message Id: 2852
  Date spooled:                 Aug 22 2016 02:12:29 UTC
  Replicated:                   yes
  Replicated Mate Message Id:   n/a

Message Id: 2853
  Date spooled:                 Aug 22 2016 02:12:29 UTC
  Replicated:                   yes
  Replicated Mate Message Id:   n/a
    
Message Id: 2901
  Date spooled:                 Aug 22 2016 02:12:39 UTC
  Replicated:                   yes
  Replicated Mate Message Id:   n/a

Message Id: 2903
  Date spooled:                 Aug 22 2016 02:12:39 UTC
  Replicated:                   yes
  Replicated Mate Message Id:   n/a

Message Id: 2944
  Date spooled:                 Aug 22 2016 02:15:30 UTC
  Replicated:                   yes
  Replicated Mate Message Id:   2919

Message Id: 2945
  Date spooled:                 Aug 22 2016 02:15:30 UTC
  Replicated:                   yes
  Replicated Mate Message Id:   2921
...

Step 3: Display the Internal IDs of Messages on the Active Site

For example:

BOS_EventBroker# show queue myQueue message-vpn Trading_VPN messages oldest

Name: myQueue

Message Id                          Replicated Mate Message Id          Sent
2840                                2852                                 yes
2841                                2853                                 yes
2919                                n/a                                   no
2921                                n/a                                   no

Step 4: Delete Messages on the Formerly Active Site

Delete messages on the formerly active site (NY_EventBroker1 in the example above) that match ALL of the following conditions:

  • The message must have a Replicated flag set to yes.
  • The message must have a Replicated Mate Message Id of n/a.
  • The message's Id must not match any Replicated Mate Message Id on the active site (BOS_EventBroker in the example above).

In the example above, messages 2901 and 2903 can be deleted.

Messages 2852 and 2853 cannot be deleted because their Ids exist in the Replicated Mate Message Id field of BOS_EventBroker.

Messages 2944 and 2945 cannot be deleted because the Replicated Mate Message Id field is not n/a.

Enter the following ADMIN command to delete the messages:

solace(admin/message-spool)# delete-messages queue <queue-name> message <msg-id>

For example, to delete message 2901 and 2903 in the example above:

NY_EventBroker1# admin
NY_EventBroker1(admin)# message-spool message-vpn Trading_VPN
NY_EventBroker1(admin/message-spool)# delete-messages queue myQueue message 2901
NY_EventBroker1(admin/message-spool)# delete-messages queue myQueue message 2903