Health Checks and Readiness Probes

This topic discusses the health check endpoints provided by Solace event brokers, their responses, and how to configure health check related settings.

Solace event brokers provide health check endpoints that serve multiple purposes:

  • Load Balancer Health Checks—In many operational environments, such as clouds and PaaS, load balancers use health checks to determine if traffic should be directed to a particular event broker.
  • Infrastructure Automation—In Kubernetes and other infrastructure automation environments, health checks are used to determine readiness and availability of services.

On a fresh installation or a reload of a default configuration, the health check service is enabled at startup on software event brokers and disabled on appliance event brokers. When upgrading appliance event broker and software event broker messaging nodes from an earlier version, no change is applied to the configuration. In other words, if the service was disabled it will stay disabled and if the service was enabled it will stay enabled. In version 10.25.11 and earlier, the health check port was disabled by default on software event broker monitor nodes. It is enabled when you upgrade to version 10.25.12 or later.

Software Event Broker Response to Load Balancer Health Checks

A software event broker responds to load balancer HTTP health check GETs on both /health-check/guaranteed-active and /health-check/direct-active in accordance with the following table.

Software Event Broker State Redundancy Configured Redundancy Not Configured
Active

for /health-check/guaranteed-active:

    200: if the message spool is up

    503: if the message spool is down

for /health-check/direct-active:

    200

for /health-check/guaranteed-active:

    200: if the message spool is up

    503: if the message spool is down

for /health-check/direct-active:

    200

Standby

for either /health-check/guaranteed-active or /health-check/direct-active:

    503

for either /health-check/guaranteed-active or /health-check/direct-active:

    N/A

Monitoring

for either /health-check/guaranteed-active or /health-check/direct-active:

    503

for either /health-check/guaranteed-active or /health-check/direct-active:

    N/A

Appliance Event Broker Response to Load Balancer Health Checks

An appliance event broker responds to load balancer HTTP health check GETs on both /health-check/guaranteed-active and /health-check/direct-active in accordance with the table below.

The appliance event broker does not support hostlist high availability (HA) failover, so the /health-check/guaranteed-active health check is only useful across multiple HA appliance event broker pairs. The /health-check/guaranteed-active could be used with the appliance event broker in cases where the clients are using MQTT QoS1 clean sessions with a load balancer distributing incoming HTTP requests across multiple appliance event brokers not running HA.

Appliance Event Broker State Redundancy Configured Redundancy Not Configured

 

Static IP Primary/Backup IP Static IP Primary/Backup IP
Active

for /health-check/guaranteed-active:

    503

for /health-check/direct-active:

    200

for /health-check/guaranteed-active:

    200: if the message spool is up

    503: if the message spool is down

for /health-check/direct-active:

    *200

for either /health-check/guaranteed-active or /health-check/direct-active:

    200

for either /health-check/guaranteed-active or /health-check/direct-active:

    200

Standby

for /health-check/guaranteed-active:

    503

for /health-check/direct-active:

    200

for /health-check/guaranteed-active:

    N/A

for /health-check/direct-active:

    *200

for either /health-check/guaranteed-active or /health-check/direct-active:

    N/A

for either /health-check/guaranteed-active or /health-check/direct-active:

    N/A

* When using /health-check/direct-active, with redundancy configured, the primary and backup IP addresses are transferred to the active event broker.

The guaranteed health check API is only supported for an Active/Standby redundancy configuration.

Readiness Endpoint for Orchestration Environments

In addition to the /health-check/guaranteed-active and /health-check/direct-active load balancer health check endpoints, Solace event brokers provide a dedicated /health-check/readiness endpoint designed specifically for orchestration environments like Kubernetes. This endpoint determines if an event broker is ready for failover operations, which is particularly useful for Kubernetes readiness probes.

The readiness endpoint ensures that when maintenance happens, like moving event broker containers to new nodes or upgrading your system, your services remain available. The orchestration environment accomplishes this by making sure that at least two out of three event brokers in a HA group are ready at all times. To make this work properly, your orchestration environment should be configured to avoid operations that would bring the number of event brokers that are ready below two. In Kubernetes this configuration is called a Pod Disruption Budget (PDB).

The readiness endpoint returns the following HTTP status codes:

  • 200—The event broker is ready

  • 503—The event broker is not ready

  • 400—The request is invalid (for example, when made on the appliance event broker's static interface)

To determine readiness, the endpoint checks a number of subsystems. After determining readiness, the endpoint returns a JSON response with the following format:

  • If the event broker is ready (HTTP status 200):

    {"ready":true}
  • If the event broker is not ready (HTTP status 503), it returns a response indicating which subsystems aren't ready. For example:

    {"ready":false,"unreadySubsystems":["config-sync","redundancy"]}

Software Event Broker Health Check Listen-Port

New installations of the software event broker by default actively listen on port 5550 and 5553 (if using TLS/SSL) for health checks, but it is possible to change the listen-port setting to some other value using either Solace Event Broker CLI or a configuration key.

The health check listen-port setting applies to all health check endpoints, including /health-check/guaranteed-active, /health-check/direct-active, and /health-check/readiness.

Setting with CLI

To change the port setting, execute the health-check listen-port command:

solace(configure/service/health-check) # listen-port <port> [ssl]

To restore the listen-port to its default value, execute the no version of the previous command:

solace(configure/service/health-check) # no listen-port [ssl]

Where:

ssl specifies the TLS/SSL health check listen-port.

Setting with a Configuration Key

At creation and startup of a software event broker, the service/healthcheck/port or service/healthcheck/tlsport configuration key can be used to set a specific port to use as the health check listen-port.

For information on initializing software event broker containers with configuration keys, refer to Initializing a Software Event Broker Container.

Appliance Event Broker Health Check Listen-Port

On a fresh installation or when reloading the default configuration, health checking is disabled on the appliance event broker and must be manually enabled.

By default, the appliance event broker listens on 5550 and 5553 (if using TLS/SSL) for health checks, but it is possible to change the listen-port setting to some other value using Solace Event Broker CLI.

The health check listen-port setting applies to all health check endpoints, including /health-check/guaranteed-active, /health-check/direct-active, and /health-check/readiness.

Setting with CLI

Enable the health-check and execute the health-check listen-port command to change the port setting:

solace(configure/service/health-check) # no shutdown
solace(configure/service/health-check) # listen-port <port> [ssl]

To restore the listen-port to its default value, execute the no version of the previous command:

solace(configure/service/health-check) # no listen-port [ssl]

Where:

ssl specifies the TLS/SSL health check listen-port.

How to Stop Listening to a Health Check

To stop health check listening, execute the shutdown command:

solace(configure/service/health-check) # shutdown [plain-text] [ssl]

To restore health check listening, execute the no shutdown command:

solace(configure/service/health-check) # no shutdown [plain-text] [ssl]

Where:

plain-text specifies the plain-text health check listen-port.

ssl specifies the TLS/SSL health check listen-port.

If you execute the shutdown or no shutdown command with no keywords, the event broker disables or enables both ports.