Performing a Controlled Failover
To perform a controlled replication failover so that clients are switched over from a Message VPN with an active replication state at one site to the corresponding Message VPN at the alternate site, the following steps must be taken:
- Step 1: Verify that the Replication Bridge is Bound to the Replication Queue
- Step 2: Switch the Message VPN Replication State to Standby
- Step 3: Wait for the Replication Queue in the Formerly Active Message VPN to Drain
- Step 4: Heuristically Commit or Roll Back Any In-Progress Transactions
- Step 5: Make the Formerly Replication Standby Message VPN Replication Active
- Step 6: Delete the Heuristically Completed Transactions
To be truly controlled, the replication bridge connection from the replication site that will take over activity must be bound to the other replication site’s replication queue. Before performing a controlled failover, every effort should be made to minimize the possibility of disconnecting the replication bridge. After the procedure has been started by configuring the Message VPNs in the site to give up activity with a replication standby state, if the replication bridge connection goes down before the replication queue has been drained, the failover becomes an uncontrolled failover.
In the example scenario used in these steps, the clients from a single Message VPN (Trading_VPN
) with an active replication state at one site (New York) are switched to the corresponding Message VPN with a standby replication state at the alternate site (Boston).
While this simple example only shows replication sites with a single Message VPN, in real world scenarios, these steps must be performed through the CLI for each Message VPN involved in a replication site failover.
Step 1: Verify that the Replication Bridge is Bound to the Replication Queue
Verify that the replication bridge from the event broker with the replication-standby Message VPN that you want to make active is bound to the replication queue of the Message VPN with the active replication state.
Here it is assumed that NY_EventBroker1
and BOS_EventBroker
are active for the Guaranteed Messaging-enabled virtual router at their respective sites.
New York Data Center
NY_EventBroker1> show message-vpn Trading_VPN replication Flags Legend: A - Admin State (U=Up, D=Down) C - Config State (A=Active, S=Standby) B - Local Bridge State (U=Up, Q=Queue Unbound, D=Down, -=N/A) R - Remote Bridge State (U=Up, D=Down, -=N/A) Q - Queue State (U=Up, D=Down, -=N/A) S - Sync Replication Eligible (Y=Yes, N=No, -=N/A) M - Reject Msg When Sync Ineligible (Y=Yes, N=No) T - Transaction Replication Mode (A=Async, S=Sync, -=N/A) Message VPN A C B R Q S M T -------------------------------- - - - - - - - - - Trading_VPN U A - U U - N A NY_EventBroker1>
Boston Data Center
BOS_EventBroker> show message-vpn Trading_VPN replication Flags Legend: A - Admin State (U=Up, D=Down) C - Config State (A=Active, S=Standby) B - Local Bridge State (U=Up, Q=Queue Unbound, D=Down, -=N/A) R - Remote Bridge State (U=Up, D=Down, -=N/A) Q - Queue State (U=Up, D=Down, -=N/A) S - Sync Replication Eligible (Y=Yes, N=No, -=N/A) M - Reject Msg When Sync Ineligible (Y=Yes, N=No) T - Transaction Replication Mode (A=Async, S=Sync, -=N/A) Message VPN A C B R Q S M T -------------------------------- - - - - - - - - - Trading_VPN U S U - - - N A BOS_EventBroker>
Step 2: Switch the Message VPN Replication State to Standby
Switch the currently replication-active Message VPN to standby.
New York Data Center
NY_EventBroker1(configure)# message-vpn Trading_VPN NY_EventBroker1(configure/message-vpn)# replication state standby
Step 3: Wait for the Replication Queue in the Formerly Active Message VPN to Drain
Allow any messages or transactions that are in progress from the formerly replication-active Message VPN to its corresponding Message VPN on its replication mate to arrive . Allowing the propagation of all messages and transactions to the standby Message VPN can prevent the loss of asynchronous replication messages and transactions.
To determine if the replication queue for the Message VPN that was just changed from active to standby has drained, enter the show queue command for the Message VPN’s replication queue (named #MSGVPN_REPLICATION_DATA_QUEUE
). When the output displays a value of 0 for "Current Messages Spooled", the queue has been drained.
New York Data Center
NY_EventBroker1(configure)# show queue #MSGVPN_REPLICATION_DATA_QUEUE message-vpn Trading_VPN Name : #MSGVPN_REPLICATION_DATA_QUEUE Message VPN : Trading_VPN ... Current Messages Spooled : 1 Current Spool Usage (MB) : 0.0006 ...
The system administrator should not configure the Message VPN in the other replication mate (BOS_EventBroker
) as replication-active until "Current Messages Spooled" is 0 for the replication queue for the Message VPN that was just switched to replication standby.
Step 4: Heuristically Commit or Roll Back Any In-Progress Transactions
If the Message VPN is using XA transactions, there may be some prepared transactions on the formerly active site that need to be heuristically committed or rolled back. Only prepared transactions have to be addressed. Transactions in other states can be ignored. If you do not deal with the prepared transactions:
- you will waste transaction resources that will reduce the transaction handling capacity of both sites
- in the event of a failback to the originally active site, duplicate message delivery or message loss may occur
For information on how to heuristically commit or roll back transactions, refer to Performing Heuristic Actions on Transactions.
It is important that you only perform the heuristic commit or heuristic rollback operations on the formerly active Message VPN.
Deciding whether to commit or roll back the transaction will depend on various factors. When looking at XA transactions, the end goal is to make sure that the transactions are treated consistently on all branches of the distributed transaction across both replication sites. Here are some guidelines for making this decision:
- For prepared XA transactions that are controlled by a transaction manager in an application server, you should check the logs or state of the transaction manager for the XID of the prepared transaction to examine the other branches of the distributed transaction:
- If all the other branches have been committed, you should heuristically commit the transaction
- If any of the other branches have rolled back, you should heuristically roll back the transaction
- For XA prepared transactions that are not controlled by a transaction manager, manually coordinate the distributed transaction so that all the branches of the distributed transaction are either committed or rolled back.
Showing Replicated Transactions
To show replicated transactions, enter the following User EXEC command:
solace> show transaction replicated
Example:
Solace # show transaction message-vpn blue_02 state PREPARED replicated Flags Legend T - Transaction Type (X=XA L=Local) S - Transaction State (A=Active S=Suspended I=Idle P=Prepared C=Complete) R - Replicated (Y=Yes N=No) XID Messages Message VPN T S R Last State Change Spooled --------------------------------------------- - - - ----------------- -------- 0021ABC4-00-01 blue_02 X P Y 1s 0
To show the details of in-progress replicated transactions, enter the following User EXEC command:
solace> show transaction message-vpn blue_02 state PREPARED replicated detail
Example:
Solace # show transaction replicated detail XID: 0021B028-00-01 Message VPN: blue_02 Client: username/15848/#000c0001 Client Username: default Session: N/A Idle Timeout: 0 Type: XA State: PREPARED Replicated: Yes Last State Change: 0d 0h 0m 0s Messages: 10 Messages Published: 0 Messages Consumed: 150 Publisher Messages: Message Id Topic -------------------- ----------------------------------------------------------- Consumer Messages: Message Id Type Endpoint Name -------------------- ----- ----------------------------------------------------- 3118727406 queue test 3118727407 queue test 3118727408 queue test 3118727409 queue test 3118727410 queue test 3118727411 queue test 3118727412 queue test 3118727413 queue test 3118727414 queue test 3118727415 queue test
To show the details of a particular transaction, enter the following User EXEC command:
solace> show transaction xid <xid> detail
Where:
xids
specifies the XID of the transaction to be displayed.
Step 5: Make the Formerly Replication Standby Message VPN Replication Active
To restore the server, you need to switch the formerly replication-standby message VPN to the active state.
BOS_EventBroker(configure)# message-vpn Trading_VPN BOS_EventBroker(configure/message-vpn)# replication state active
At this point, client should be able to re-connect to the message VPN and full replication service will resume.
Step 6: Delete the Heuristically Completed Transactions
If there are transactions that you previously heuristically completed, you should delete them to free up resources. You must always delete the completed transactions on the formerly active site. You may have to delete completed transactions on the newly active site, depending on the replication mode and the XA transaction manager. The XA transaction manager may automatically delete the heuristically completed transactions on the active Message VPN after it connects to the newly active site as it reconciles the XA transaction states. You should allow this process to complete before deleting the completed transactions.
To delete a completed transaction, enter the following ADMIN command on the formerly active site:
solace(admin/message-spool) delete-transaction xid <xid>
Where:
xid
specifies the XID of the transaction to be deleted.
You should check both the standby and active Message VPNs for completed transactions.