High Availability in PubSub+ Cloud

PubSub+ Cloudevent broker services can be deployed in high-availability (HA) redundancy groups. HA redundancy provides 1:1 event broker sparing to provide fault tolerance and increase overall service availability. If one of the event brokers fails or is taken out of service, the other event broker automatically takes over and provides service to the clients that were previously served by the now-out-of-service event broker. There is a brief interruption of less than one minute that occurs during the HA activity failover. In comparison, outages of 15-30 minutes occur for Developer and standalone event broker services because they do not have HA redundancy.

To learn more about HA redundancy, see High Availability for Software Event Brokers.

HA Concepts

PubSub+ Cloud implements HA using an Active/Standby model with an arbiter node (Monitoring Node) for split-brain detection. This requires three nodes each running the event broker:

  • Primary node
  • Backup node
  • Monitoring node

The primary and backup nodes both run the software event broker under the messaging node role, while the monitoring node runs it under the monitoring node role. Each of their respective roles is fixed by the configuration and never changes. The HA group is fronted by a network load balancer that routes traffic to-and-from the active node in the HA group (either the primary or backup).

When in operation, the messaging nodes will assume one of these Active/Standby roles: Primary or Backup. At any one time, one node is the primary and the other is the backup.

With this model, a primary event broker provides messaging services to clients, while a backup event broker waits in standby mode—it only provides service should the primary event broker fail. A third event broker acts as a monitoring node, to act as a tie-breaker and prevent split-brain scenarios that would otherwise cause both the primary and backup event broker to become active simultaneously.

Upon a failover, connections to the broker are switched over from the Primary to the Backup node automatically.

Subsequently, a failover occurs in the following sequence:

  1. The backup event broker takes over messaging activity.
  2. Once the failed primary event broker comes back on-line, it resynchronizes to match the currently active backup event broker.
  3. The primary event broker takes on the “Standby” role, or, if auto-revert is enabled, messaging activity automatically switches back to the primary event broker.

HA in Public and Private Clouds

To ensure that a high-availability group is adequately provisioned, pods run on different worker nodes. Additionally, the pods can be spread over multiple availability zones (AZ) when available. The following diagram shows a Kubernetes cluster that has worker nodes over three availability zones. The Cloud-Agent will schedule the Messaging nodes over two AZ and the monitor node on a third AZ. For each HA service, the primary pod is deployed in one AZ, the backup pod in a second AZ, and the monitoring pod in a third AZ. This guarantees that pods for the same HA service are not running on the same hardware.

Similarly, when deploying a HA group in virtual private clouds such as AWS, there are two network topologies available.

  1. For regions with three or more AZ:

  2. For regions with two AZ:

Connecting to a Cloud HA Group

Typically, applications using HA would have to provide a host list: one IP address for the primary node and another for the backup node. However, this approach (providing hosts list) does not work for 3rd-party messaging APIs, so PubSub+ Cloud uses a single DNS entry for applications to use (behind a load balancer), abstracting away the switchover between primary and backup in the event of a failure.

HA and Service Types

The following service types deploy an HA redundancy group by default:

  • Professional (Standard account)
  • Enterprise (Enterprise account)

PubSub+ Cloud automates all of the configuration and setup when you create your event broker service. Once the event broker service is created, applications can use the DNS name entry provided in the connectivity tab in the console.

Screenshot showing an example as described by the surrounding text.

HA-Link Security

When a new enterprise event broker service is created, the communication between the primary and backup event brokers are encrypted by default, including the HA mate link and Config-sync . You can override the default HA Mate link encryption to plain text through the advanced options when you create a service (see Configuring High-Availability Mate-Link Encryption). Overriding the default HA mate link encryption to plain text may be useful if you require maximum performance, and are willing to trust the security restrictions of the VPC in the cloud providers or on-premises; Config-sync always remains encrypted.

If you have an existing event broker service without encryption, you can encrypt it, including its HA mate link and Config-Sync link through the console or the REST API. In the console, you can easily differentiate between the encrypted services and ones that are not; when the mate-link encryption is disabled, a warning icon is displayed on the event broker service's status screen. For more information, see Configuring High-Availability Mate-Link Encryption.

Viewing the Mate-link Encryption Status

The status of the mate-link encryption is available in Cluster Manager and shown on the Status tab for the selected event broker service.

Modifying the HA Mate-Link Encryption Status

To modify the HA mate-link encryption status for an existing event broker service perform these steps:

  1. Log in to the PubSub+ Cloud Console if you have not done so yet. The URL to access the Cloud Console differs based on your authentication scheme. For more information, see Logging In to the PubSub+ Cloud Console.

  2. Select Cluster Manager from the navigation bar.

  3. Select the event broker service with the HA mate-link encryption status you want to modify. If the event broker service is not listed, make sure you have the right environment selected. For more information, see Selecting Environments.

  4. On the service page, select the Manage tab.
  5. On the Manage tab, click Advanced Options.

  6. In the Mate-Link Encryption pane, select Disable or Enable to modify the encryption.