Performing a Controlled Failover

To perform a controlled replication failover so that clients are switched over from a Message VPN with an active replication state at one site to the corresponding Message VPN at the alternate site, the following steps must be taken:

To be truly controlled, the replication bridge connection from the replication site that will take over activity must be bound to the other replication site’s replication queue. Before performing a controlled failover, every effort should be made to minimize the possibility of disconnecting the replication bridge. After the procedure has been started by configuring the Message VPNs in the site to give up activity with a replication standby state, if the replication bridge connection goes down before the replication queue has been drained, the failover becomes an uncontrolled failover.

In the example scenario used in these steps, the clients from a single Message VPN (Trading_VPN) with an active replication state at one site (New York) are switched to the corresponding Message VPN with a standby replication state at the alternate site (Boston).

While this simple example only shows replication sites with a single Message VPN, in real world scenarios, these steps must be performed through the CLI for each Message VPN involved in a replication site failover.

Step 1: Verify that the Replication Bridge is Bound to the Replication Queue

Verify that the replication bridge from the event broker with the replication-standby Message VPN that you want to make active is bound to the replication queue of the Message VPN with the active replication state.

Here it is assumed that NY_EventBroker1 and BOS_EventBroker are active for the Guaranteed Messaging-enabled virtual router at their respective sites.

New York Data Center

NY_EventBroker1> show message-vpn Trading_VPN replication
Flags Legend:
A - Admin State (U=Up, D=Down)
C - Config State (A=Active, S=Standby)
B - Local Bridge State (U=Up, Q=Queue Unbound, D=Down, -=N/A)
R - Remote Bridge State (U=Up, D=Down, -=N/A)
Q - Queue State (U=Up, D=Down, -=N/A)
S - Sync Replication Eligible (Y=Yes, N=No, -=N/A)
M - Reject Msg When Sync Ineligible (Y=Yes, N=No)
T - Transaction Replication Mode (A=Async, S=Sync, -=N/A)

Message VPN                      A C B R Q S M T
-------------------------------- - - - - - - - - -
Trading_VPN                      U A - U U - N A

NY_EventBroker1> 

Boston Data Center

BOS_EventBroker> show message-vpn Trading_VPN replication
Flags Legend:
A - Admin State (U=Up, D=Down)
C - Config State (A=Active, S=Standby)
B - Local Bridge State (U=Up, Q=Queue Unbound, D=Down, -=N/A)
R - Remote Bridge State (U=Up, D=Down, -=N/A)
Q - Queue State (U=Up, D=Down, -=N/A)
S - Sync Replication Eligible (Y=Yes, N=No, -=N/A)
M - Reject Msg When Sync Ineligible (Y=Yes, N=No)
T - Transaction Replication Mode (A=Async, S=Sync, -=N/A)

Message VPN                      A C B R Q S M T
-------------------------------- - - - - - - - - -
Trading_VPN                      U S U - - - N A

BOS_EventBroker> 

Step 2: Switch the Message VPN Replication State to Standby

Switch the currently replication-active Message VPN to standby.

New York Data Center

NY_EventBroker1(configure)# message-vpn Trading_VPN
NY_EventBroker1(configure/message-vpn)# replication state standby

Step 3: Wait for the Replication Queue in the Formerly Active Message VPN to Drain

Allow any messages or transactions that are in progress from the formerly replication-active Message VPN to its corresponding Message VPN on its replication mate to arrive . Allowing the propagation of all messages and transactions to the standby Message VPN can prevent the loss of asynchronous replication messages and transactions.

To determine if the replication queue for the Message VPN that was just changed from active to standby has drained, enter the show queue command for the Message VPN’s replication queue (named #MSGVPN_REPLICATION_DATA_QUEUE). When the output displays a value of 0 for "Current Messages Spooled", the queue has been drained.

New York Data Center

NY_EventBroker1(configure)# show queue #MSGVPN_REPLICATION_DATA_QUEUE message-vpn Trading_VPN 
Name                                 : #MSGVPN_REPLICATION_DATA_QUEUE
Message VPN                          : Trading_VPN
...
Current Messages Spooled             : 1
Current Spool Usage (MB)             : 0.0006
...

The system administrator should not configure the Message VPN in the other replication mate (BOS_EventBroker) as replication-active until "Current Messages Spooled" is 0 for the replication queue for the Message VPN that was just switched to replication standby.

Step 4: Heuristically Commit or Roll Back Any In-Progress Transactions

If the Message VPN is using XA transactions, there may be some prepared transactions on the formerly active site that need to be heuristically committed or rolled back. Only prepared transactions have to be addressed. Transactions in other states can be ignored. If you do not deal with the prepared transactions:

  • you will waste transaction resources that will reduce the transaction handling capacity of both sites
  • in the event of a failback to the originally active site, duplicate message delivery or message loss may occur

For information on how to heuristically commit or roll back transactions, refer to Performing Heuristic Actions on Transactions.

It is important that you only perform the heuristic commit or heuristic rollback operations on the formerly active Message VPN.

Deciding whether to commit or roll back the transaction will depend on various factors. When looking at XA transactions, the end goal is to make sure that the transactions are treated consistently on all branches of the distributed transaction across both replication sites. Here are some guidelines for making this decision:

  • For prepared XA transactions that are controlled by a transaction manager in an application server, you should check the logs or state of the transaction manager for the XID of the prepared transaction to examine the other branches of the distributed transaction:
    • If all the other branches have been committed, you should heuristically commit the transaction
    • If any of the other branches have rolled back, you should heuristically roll back the transaction
  • For XA prepared transactions that are not controlled by a transaction manager, manually coordinate the distributed transaction so that all the branches of the distributed transaction are either committed or rolled back.

Showing Replicated Transactions

To show replicated transactions, enter the following User EXEC command:

solace> show transaction replicated

Example:

Solace # show transaction message-vpn blue_02 state PREPARED replicated 
Flags Legend
T - Transaction Type (X=XA L=Local)
S - Transaction State (A=Active S=Suspended I=Idle P=Prepared C=Complete)
R - Replicated (Y=Yes N=No)
XID                                                                   Messages
Message VPN                                   T S R Last State Change  Spooled
--------------------------------------------- - - - ----------------- --------
0021ABC4-00-01
blue_02                                       X P Y                1s        0

To show the details of in-progress replicated transactions, enter the following User EXEC command:

solace> show transaction message-vpn blue_02 state PREPARED replicated detail

Example:

Solace # show transaction replicated detail
XID:                       0021B028-00-01
Message VPN:               blue_02
Client:                    username/15848/#000c0001
Client Username:           default
Session:                   N/A
Idle Timeout:              0
Type:                      XA
State:                     PREPARED
Replicated:                Yes
Last State Change:         0d 0h 0m 0s
Messages:                  10
Messages Published:        0
Messages Consumed:         150
Publisher Messages:
Message Id           Topic
-------------------- -----------------------------------------------------------
Consumer Messages:
Message Id           Type  Endpoint Name
-------------------- ----- -----------------------------------------------------
3118727406           queue test
3118727407           queue test
3118727408           queue test
3118727409           queue test
3118727410           queue test
3118727411           queue test
3118727412           queue test
3118727413           queue test
3118727414           queue test
3118727415           queue test

To show the details of a particular transaction, enter the following User EXEC command:

solace> show transaction xid <xid> detail

Where:

xids specifies the XID of the transaction to be displayed.

Step 5: Make the Formerly Replication Standby Message VPN Replication Active

To restore the server, you need to switch the formerly replication-standby message VPN to the active state.

BOS_EventBroker(configure)# message-vpn Trading_VPN
BOS_EventBroker(configure/message-vpn)# replication state active

At this point, client should be able to re-connect to the message VPN and full replication service will resume.

Step 6: Delete the Heuristically Completed Transactions

If there are transactions that you previously heuristically completed, you should delete them to free up resources. You must always delete the completed transactions on the formerly active site. You may have to delete completed transactions on the newly active site, depending on the replication mode and the XA transaction manager. The XA transaction manager may automatically delete the heuristically completed transactions on the active Message VPN after it connects to the newly active site as it reconciles the XA transaction states. You should allow this process to complete before deleting the completed transactions.

To delete a completed transaction, enter the following ADMIN command on the formerly active site:

solace(admin/message-spool) delete-transaction xid <xid>

Where:

xid specifies the XID of the transaction to be deleted.

You should check both the standby and active Message VPNs for completed transactions.