Running Event Mesh Health Checks

An Event Mesh Health Check (or just Health Check) validates that events can be exchanged between the links in an event mesh. To run the Health Check on an event mesh in Mesh Manager, you require the Mission Control Viewer role assigned to your user profile. For more information about the role requirements, see Considerations for Working with Event Meshes in PubSub+ Cloud.

For a Health Check to run, the operational status of a link must be Up. If the operational is Down, you won't be able to run a Heath Check and may need to troubleshoot the issue on the event broker service. To do this, select the service in Cluster Manager and then open Broker Manager. For more information, see Troubleshooting Operational Links.

The Health Check uses an advanced pinger to check the connectivity between event broker services. A Health Check validates the following in an event mesh using one secure SMF connection on each event broker service:

  • Link status—Each direction for a link between two event broker services is pinged. This validates connectivity and the time of the ping is recorded. This process checks all links by sending a ping from each event broker service (or event broker) to the other event broker services in the event mesh.
  • Event status—We check the topics are properly propagated using the Request-Reply pattern to check that topics can be subscribed and published to using a reserved topic called #insights/pinger/ping. A temporary client username is created and removed after with the name mesh-validation-<session-id> where <session-id> is a string value representing the Health Check session.

We recommend that you run a Health Check if you have manually made changes to event broker service or are experiencing issues with your event mesh.

For more information about:

Running Health Checks

There are two ways that you can run a Health Check in Mesh Manager. One way is to run it from selecting the event mesh and the other is when you view the details of mesh. Use these steps to run a Health Check:

  1. Log in to the PubSub+ Cloud Console if you have not done so yet. The URL to access the Cloud Console differs based on your authentication scheme. For more information, see Logging into the PubSub+ Cloud Console.

  2. Select Mesh Manager from the navigation bar.

  3. On the Mesh Manager: Event Meshes page, on the card of an event mesh, you can either:

  • click Mesh Actionson the card, and then select Run Health Check. You should see the Health Check run:

  • click Mesh Actions on the card, select View Details, expand Latest Health Check on the Event Mesh Details panel, and then click Run Health Check.

    For example, when you run a Health Check using these steps, the following dialog appears:

    As the Health Checks run, the progress appears in the Event Mesh Health Check dialog. You can optionally click Return to Mesh if you choose to dismiss the dialog.

    If you keep the dialog up, each of the tests complete for each service and you'll see the test status. You can expand each event broker service to see the details of the test. Results for each of the endpoints are visible.

Note that if you had clicked Return to Mesh before the Health Check completes, but stayed on the page, the Health Check runs in the background and dialog updates as the validation completes; however if you navigate to another page, you must bring up the results to check the status of the Health Check. For information about how to check the results of a Health Check, see Viewing the Status of a Health Check and Links in an Event Mesh.

Handling Failed Health Checks

An event mesh is considered unhealthy if at least one link between any of the event broker services fails the Health Check test. It's important to note that the Health Check test reflects the event mesh itself and not the individual services. To understand how to view the status of a Health Check, see Viewing the Status of a Health Check and Links in an Event Mesh.

You can identify the failed link and useful information to identify the issue with the link.

Note that if the Health Check is not successful, sometimes artifacts created for the Health Check aren't cleaned up as expected. Here's the clean-up required at times after a failed Health Check:

  • the temporary mesh-validation-<session-id> might not be properly deleted. To resolve this, you must remove the temporary client username from each event broker service within your event mesh.

During the event mesh beta, certain properties in the ACL Profile for your event broker service must be set to Allow. If you have configured these properties to Disallow, the health check will fail. See Configuring ACL Profile Properties When Using the Event Mesh.

Troubleshooting Operational Links

If the operational status of a link is down, you won't be able to run a Health Check. To help in identifying the cause of the problem, you need to use Broker Manager to identify and resolve the operational status of a link.

To troubleshoot the link, you can either go to Cluster Manager, select one of event broker service in the event mesh you're troubleshooting, and then go to Broker Manager or go directly from Mesh Manager.

For information about using Cluster Manager, see Viewing Event Broker Services.

Here, we'll show you how to access Broker Manager from Mesh Manager.

  1. Log in to the PubSub+ Cloud Console if you have not done so yet. The URL to access the Cloud Console differs based on your authentication scheme. For more information, see Logging into the PubSub+ Cloud Console.

  2. Select Mesh Manager from the navigation bar.
  3. In Mesh Manager, on the Mesh Manager: Event Meshes page, click Mesh Actionson your event mesh, and select View Details.
  4. On the Event Mesh Details card, click the event broker service in your event mesh that you want troubleshoot.
  5. On the service card, click Service Actions and then select Manage Service.
  6. In Broker Manager, click Clustering and then troubleshoot the links from there. For example, you could select the External Links tab to check if a link is down.

For more information about using Broker Manager, see Using PubSub+ Broker Manager. Some common problems that may occur:

Linking to an event broker service that previously existed
If you deleted the service that was the second-last event broker service in an event mesh, at times the links aren't cleaned up and you might need to manually remove the previous external links using Broker Manager.
One of the event broker services is in a Virtual Private Cloud/Virtual Network (VPC/VNet) or one uses a private endpoint
If one of the event broker services has a public endpoint while the other has a private endpoint, the initiating service must be the service with the private endpoint.
In either case, switch the initiator so that the initiator is the private endpoint or the service that's connecting from a private region to resolve this issue.
For more information, see Switch the Initiator for a Link on the Event Mesh.
Can't validate server certificates
If you are using server certificates instead of the default Solace server certificates, you must ensure that those server certificates are uploaded to each of the event broker services in your event mesh.
Links between event broker services fail when both services are in a different Virtual Private Clouds/Virtual Networks (VPCs/VNets) or both use private endpoints in different VPCs/VNets.
The IP connectivity between private regions (for example, Customer-Controlled Regions, such as VPCs/VNets) are the responsibility of your organization. It is recommended that you verify the connectivity between regions, otherwise creating an event mesh is not possible.

Configuring ACL Profile Properties When Using the Event Mesh

To successfully use Mesh Manager during its beta, the following properties in your Access Control List (ACL) Profile for the event broker service in the event mesh must be set to Allow.

  • Client Connect Default

  • Publish Default Action

  • Subscribe Default Action

Allow is set by for these properties when the ACL Profiles are generated during service creation. If you set any of them to disallow, health checks you perform on your event mesh will fail. You can configure the ACL Profiles in the Broker Manager on the Access Control tab.

  1. Log in to the PubSub+ Cloud Console if you have not done so yet. The URL to access the Cloud Console differs based on your authentication scheme. For more information, see Logging into the PubSub+ Cloud Console.

  2. On the navigation bar, click Cluster Manager .
  3. On the Services page, select the card for the event broker service you want to configure and then click Open Broker Manager..
  4. In Broker Manager, select Access Control.
  5. On the Access Control page, click the ACL Profiles tab.

    You can review the configuration of each property in an ACL Profile in the table. If you see the Client Connect Default, Publish Default Action, or Subscribe Default Action properties set to disallow, you must change them to allow.

  6. Select the ACL profile you want to change by clicking it. Note that you cannot change the properties for the #acl-profile profile.

  7. On the ACL Profiles page, click the tab for the property you want to change. For example, click Publish Topic to change the Publish Default Action property.

  8. Click Edit.
  9. Click in the property field and select Allow and then click Apply.
  10. Repeat steps 7 through 9 until you have set all the required properties to Allow.
  11. Click the Back to return to the ACL Profiles page.