Running Event Mesh Health Checks
An Event Mesh Health Check (or just Health Check) validates each link by sending a ping from each event broker service (or event broker) to the other event broker services in the event mesh. To run the Health Checks, you require the Mesh Manager Viewer role assigned to your user profile. For more information about the role requirements, see Considerations for Working with Event Meshes in PubSub+ Cloud.
In Mesh Manager, you can validate the health of an event mesh using Health Check. For a Health Check to run, the operational status of a link must be
Up. If the operational is
Down, you won't be able to run a Heath Check and may need to troubleshoot the issue on the event broker. To do this, you must go to your service, go into PubSub+ Broker Manager. For more information, see Troubleshooting Operational Links.
The Health Check uses an advanced pinger to check connectivity between event broker services. A Health Check validates the following in an event mesh using one secure SMF connection on each event broker service:
- Link status— Each direction for a link between two event broker services is pinged. This validates connectivity and the time of the ping is recorded.
- Event status—We check the topics are properly propagated using the Request-Reply pattern to check that topics can be subscribed and published to using a reserved topic called
#insights/pinger/ping. A temporary client username is created and removed after with the name
<session-id>is a string value representing the Health Check session.
We recommend that you run a Health Check if you have manually made changes to event broker service or are experiencing issues with your event mesh.
For more information about:
- running a Health Check, see Running Health Checks
- handling failed Health Checks, see Handling Failed Health Checks
- troubleshooting your links that are not up (have operational state of
Down), see Troubleshooting Operational Links
There are two ways that you can run a Health Check in Mesh Manager. One way is to run it from selecting the event mesh and the other is when you view the details of mesh. Use these steps to run a Health Check:
Log in to the PubSub+ Cloud Console if you have not done so yet.
Select Mesh Manager from the navigation bar.
On the Mesh Manager: Event Meshes page, on the card of an event mesh, you can either:
click Mesh Actionson the card, and then select Run Health Check. You should see the Health Check run:
click Mesh Actions on the card, select View Details, expand Latest Health Check, and then click Run Health Check
For example, when you run a Health Check, the following dialog appears:
As the Health Checks run on each event broker service, the progress appears in the Event Mesh Health Check dialog. You can optionally click Return to Mesh if you choose to dismiss the dialog or keep the dialog up to see the progress of the Health Check.
If you keep the dialog up, each of the tests complete for each service and you'll see the test status. You can expand each event broker service to see the details of the test.
Note that if you had clicked Return to Mesh before the Health Check completes, but stayed on the page, it runs in the background and the above dialog appears when the test completes; however if you had navigated to another page, you must bring up the results to check the status of the Health Check. For information about how to check the results of a Health Check, see Viewing the Status of a Health Check and Links in an Event Mesh.
An event mesh is considered unhealthy if at least one link between any of the event broker services in your event mesh fails the Health Check test. It's important to note that the Health Check test reflects the event mesh itself and not the individual services. To understand how to view the status of a Health Check, see Viewing the Status of a Health Check and Links in an Event Mesh.
You can identify the failed link and useful information to identify the issue with the link.
Note that if the Health Check is not successful, sometimes artifacts created for the Health Check aren't cleaned up as expected. Here's the clean-up required at times after a failed Health Check:
- the temporary
mesh-validation-<session-id>might not be properly deleted. To resolve this, you must remove the temporary client username from each of your event broker service within your event mesh.
If the operational status of a link is down, you won't be able to run a Health Check. To help in identifying the cause of the problem, you need to use PubSub+ Broker Manager to identify and resolve the operational status of a link.
To troubleshoot the link, you can either go to Cluster Manager, select one of event broker service in the event mesh you're troubleshooting, and then go to PubSub+ Broker Manager or go directly from Mesh Manager.
For information about using Cluster Manager, see Viewing and Configuring Event Broker Services.
Here, we'll show you how to access PubSub+ Broker Manager from Mesh Manager.
Log in to the PubSub+ Cloud Console if you have not done so yet.
- Select Mesh Manager from the navigation bar.
- In Mesh Manager, on the Mesh Manager: Event Meshes page, click Mesh Actionson your event mesh, and select View Details.
- On the Event Mesh Details card, click the event broker service in your event mesh that you want troubleshoot.
- On the service card, click Service Actions and then select Manage Service.
- In PubSub+ Broker Manager, click Clustering and then troubleshoot the links from there. For example, you could select the External Links tab to check if a link is down.
For more information about using PubSub+ Broker Manager, see Solace PubSub+ Broker Manager. Some common problems that may occur:
- Linking to an event broker service that previously existed
- If you deleted the service that was the second-last event broker service in an event mesh, at times the links aren't cleaned up and you might need to manually remove the previous external links using PubSub+ Broker Manager.
- The event broker service is in a Virtual Private Cloud/Virtual Network (VPC/VNet)
- If one of the event broker services is in a VPC/VNet to a service that isn't in a VPC/VNet, it must be the event broker service in the VPC/VNet must be the initiating service. Switch the initiator. For more information, see Switch the Initiator for a Link on the Event Mesh.
- Can't validate Server certificates don't match
- If you are using server certificates instead of the default Solace server certificates, you must ensure that those server certificates are uploaded to each of the event broker services in your event mesh.