Solace PubSub+ Troubleshooting
This topic describes how operators and developers can troubleshoot instances of Solace PubSub+ services.
A Solace PubSub+ Service Instance is backed by a Message VPN on one or more PubSub+ Software Event Brokers, depending on the plan you’ve chosen.
You can discover the Solace PubSub+ Software Event Broker backing a service instance from the Solace PubSub+ Credentials which become available by binding an app to the instance or creating a service key for the instance.
Operator Troubleshooting
The operator will have access to the logs of all the components of Solace PubSub+ for VMware Tanzu:
- Service Broker Logs
- Service Broker Agent Logs
- Solace PubSub+ Event Broker Logs
An installation of Solace PubSub+ for VMware Tanzu may be configured to use System Logging as a means of gathering all logs.
Accessing Logs
The following locations in Solace PubSub+ BOSH VMs will hold logs of interest that can help in troubleshooting issues:
- All VM job logs
- /var/vcap/sys/log
- All Solace PubSub+ Event Broker Logs
- /var/vcap/store/containers/pubsub/volumes/jail/logs
The service broker logs are collected in the management VM under /var/vcap/sys/log/broker-logs.
When the Management app is deployed in High Availability mode, the service broker logs are captured on both the primary and backup VMS so it does not matter from which you get the logs.
This means that you can use the bosh logs command to download the service broker logs along with the other logs from the management VM.
You can also watch service broker logs as follows:
Set your API endpoint to the Cloud Controller of your deployment.
$ cf api api.YOUR-SYSTEM-DOMAIN Setting api endpoint to api.YOUR-SYSTEM-DOMAIN... OK API endpoint: <span>https:</span>//api.YOUR-SYSTEM-DOMAIN (API version: 2.82.0) Not logged in. Use 'cf login' to log in.
Log in to your deployment and select an org and a space.
$ cf login
API endpoint: <span>https:</span>//api.YOUR-SYSTEM-DOMAIN
Email> user<span>@</span>example.com
Password>
```
1. Target the `solace` org and `solace-broker` space.
```
$ cf target -o solace -s solace-broker
api endpoint: https://api.YOUR-SYSTEM-DOMAIN
api version: 2.82.0
user: admin
org: solace
space: solace-broker
```
1. Discover the Solace PubSub+ Service Broker App name.
```
$ cf apps
Getting apps in org solace /space solace-broker as user@example.com...
OK
name requested state instances memory disk urls
solace-broker-1.2.0 started 1/1 1G 512M solace-broker.YOUR-SYSTEM-DOMAIN
OK
```
<p class="note"><strong>Note:</strong> Take note of the application name. In this case 'solace-broker-1.2.0'.</p>
3. To watch the logs of Solace PubSub+ Service Broker:
```
$ cf logs solace-broker-1.2.0 | tee saved_solace-broker-1.2.0.txt
Retrieving logs for app solace-broker-1.2.0 in org solace /space solace-broker as user@example.com...
....
```
### <a id='operator_diag'></a> Additional Diagnostics
In addition to regular logging, Solace PubSub+ Event Brokers tools can help gather diagnostics logs on demand.
For example, having identified a problem with a given Solace service instance, the operator can access the backing Solace PubSub+ Event Brokers to examine logs and gather diagnostics.
The backing Solace PubSub+ Event Brokers for a given service instance can be identified from the IPs used in the bindings or service keys, or mentioned in the service broker logs or message when an exception occurs.
The operator should look for the VM in the Solace PubSub+ bosh deployment with the matching IP.
Get additional diagnostics from "EnterpriseLarge/0" VM.
$ bosh ssh EnterpriseLarge/0 # sudo -i # /var/vcap/jobs/broker_agent/bin/gather_diagnostics.sh “`
Then you can look at the gathered logs under /var/vcap/store/diag
.
In the event that a process fails and creates a core dump, the core file can be found under /var/vcap/store/cores
. Core files are compressed using gzip.
Installation Issues
If you get the error:
Colocated job 'solace-bosh-dns-aliases' is already added to the instance group 'compute'
Then do the following:
- Log into your bosh environment
- Execute the command
bosh configs
- Look for runtime-configs with ‘solace’ in their name
- Delete the solace runtime-configs using the bosh
delete-config
command - Redeploy
Upgrade Issues
The following issues may arise while upgrading a tile:
Error |
Action Failed get_task: result: 1 of 1 drain scripts failed. Failed Jobs: containers. |
---|---|
Explanation | A pre-condition was not met. A high-availability (HA) group was not in a healthy state at the start of the upgrade. This check is done at the shutdown of each VM in an HA group. |
Possible Action | Ensure an HA Group is healthy. The user can log into the failing VM and look at the log file located at /var/vcap/sys/log/containers/drain.stdout.log to determine the underlying cause. Once an HA Group is healthy, an upgrade can be retried. |
Error |
Action Failed get_task: result: 1 of 1 post-start scripts failed. Failed Jobs: containers. |
---|---|
Explanation | The upgrade was stopped due to a detected failure to ensure services remain available. |
Possible Action | The user should log into the failing VM and look at the log file located at /var/vcap/sys/log/containers/post-start.stdout.log to determine the underlying cause. Once the issue is fixed, an upgrade can be retried. |
Error |
Process 'mariadb_ctrl' Execution failed |
---|---|
Explanation | Monit reports that MariaDB failed to start. This happens when using high-availability management and 2 or more MariaDB processes are shutdown at the same time. The cluster has lost quorum and MariaDB processes can’t restart automatically. |
Possible Action | The MariaDB cluster needs to be bootstrapped. This is done by running the bootstrap errand with the command: bosh -d solace_deployment run-errand bootstrap. The bootstrap errand is always safe to run as it is a noop when it is not required. If the bootstrap errand does not fix the MariaDB process, the user should look at logs under /var/vcap/sys/log/mysql/. |
Note: For Shared service plans, every service on a broker must be marked for upgrade before the upgrade can happen. This prevents one tenant from upgrading a router before the other tenants are ready.
Note: If some Solace PubSub+ Event Brokers have been upgraded and others have not, it is not possible to create new bindings or service keys with the non-upgraded services. This is because features such as authentication schemes might have changed with the upgrade, and would not be compatible with the services created before the upgrade.
Developer Troubleshooting
A developer using Solace PubSub+ for VMware Tanzu may encounter errors when using Cloud Foundry Command-Line Interface (cf CLI) to perform basic operations on a Solace PubSub+ for VMware Tanzu service instance.
In general, most errors are about these types:
- Reaching limits (plan limits, inventory fully used)
- Communication problems between the event broker and the VM inventory it manages
- Unexpected health state, such as a degraded, high-availability setup
While this list it not complete, it provides representative samples with explanations and possible resolutions. Some of the resolutions will require operator intervention.
Deployment limits reached for a given plan | |
---|---|
Operation |
cf create-service solace-pubsub enterprise-large |
Error |
Server error, status code: 502, error code: 10001, message: Service broker error: com.solace.cloudfoundry.servicebroker.exception.SolaceServiceException: No matching Solace Message VPNs available. |
Explanation | The service broker does not find any Solace Message VPNs in its inventory for the requested service plan. |
Possible Action(s) | The operator needs to increase the number of allocated Solace PubSub+ Event Brokers that support the given plan, in this case a enterprise-large. |
Invalid Parameter | |
---|---|
Operation |
cf update-service my-large-instance -c '{"some_option": "some_value" }' |
Error |
Server error, status code: 502, error code: 10001, message: Service broker error: Unrecognized parameter key some_option |
Explanation | The service broker does not recognize the parameter name. |
Possible Action(s) | Use the correct parameters. Please see Service Specific Parameters. |
Invalid Parameter: The required feature is not enabled (TCP Routes) | |
---|---|
Operation |
cf update-service my-large-instance -c '{ "mqtt_tcp_route_enabled" : "false" }' |
Error |
Server error, status code: 502, error code: 10001, message: Service broker error: The parameter mqtt_tcp_route_enabled is invalid given the current configuration. It requires [ TCP Routes Enabled ] |
Explanation | As indicated in the error, the given parameter is only valid when TCP Routes is enabled. |
Possible Action(s) | See Configuring TCP Routes. |
Invalid Parameter: The required feature is not enabled (LDAP) | |
---|---|
Operation |
cf update-service my-large-instance -c '{ "ldapGroupAdminReadOnly" : "cn=username1,ou=groups,dc=solace,dc=com" }' |
Error |
Server error, status code: 502, error code: 10001, message: Service broker error: The parameter ldapGroupAdminReadOnly is invalid given the current configuration. It requires [ LDAP Enabled, Management Access set to LDAP ] |
Explanation | As indicated in the error, the given parameter is only valid when LDAP is enabled and Management Access is set to LDAP. |
Possible Action(s) | See Configuring LDAP and Management Access to use LDAP. |
Communication failure: The Solace PubSub+ Event Brokers is not reachable. | |
---|---|
Operation |
cf delete-service -f my-large-instance |
Error |
cf service my-large-instance Service instance: my-large-instance Service: solace-pubsub Bound apps: Tags: Plan: enterprise-large Description: Solace PubSub+ Event Broker for real-time, multi-protocol data distribution Documentation url: http://docs.solace.com Dashboard: https://enterprise-large-0.YOUR-SYSTEM.DOMAIN/#/msg-vpns/djAwNQ==?token= YWJj.eyJhY2Nlc3NfdG9rZW4iOiJ2MDA1LWlaksjdlasdjas09dasdlkansdlakslZmRmOWFlNjM3ZGYwMDcifQ%3D%3D.eHl6 Last Operation Status: delete failed Message: com.solace.cloudfoundry.servicebroker.exception.SolaceServiceException: Unable to delete Service, the associated Message VPN v001 on 10.244.0.3 is not currently available Started: 2020-01-00T00:00:00Z Updated: 2020-01-00T00:00:00Z |
Explanation | The service broker cannot delete this instance because the Message VPN is flagged as unavailable. This happens when the backing Solace PubSub+ Event Broker is not reachable. |
Possible Action(s) | The operator should examine the solace deployment and see why VM 10.244.0.3 is not available. The service delete operation can be reattempted once the VM health is restored. |
HA service degradation. | |
---|---|
Operation |
cf bind-service my-app my-ha-instance |
Error |
MessageRouterException: Primary VMR 10.233.0.3 HA Group Status is degraded v001 for messageVpn v001 |
Explanation | The service broker will always reject an operation for an HA service when the HA status is degraded. A variety of reasons may be given in the message. Other operations on the same Message Router will fail as well. |
Possible Action(s) | The operator should examine the Solace deployment and see why VM 10.244.0.3 is not available. The CF operation can be reattempted once the VM health is restored and the HA status is not degraded. |
Orphaned Resource Policy | |
---|---|
Operation |
cf unbind-service my-app my-service-instance |
Error |
Server error, status code: 502, error code: 10001, message: Service instance test-shared: Service broker error: Operation canceled. Orphaned Endpoints Policy was violated. Endpoints owned by v002.cu000001 must first be deleted: Queues: [ someQ ] Topics Endpoints: [ someTPE ] |
Explanation | The service broker will reject unbinding when the client-username that was used by the application owns Endpoints such as Queues and Topic Endpoints while the current Orphaned Resource Policy is to Abort |
Possible Action(s) | Delete the Endpoints before unbinding. Alternatively you can adjust the Service Orphaned Resource Policy. |
Dashboard Troubleshooting
Dashboard 403 Error using VMware Tanzu SSO | |
---|---|
Operation | Accessing the service dashboard with Single Sign On |
Error |
On the dashboard’s webpage, 403: Forbiddenis shown. |
Explanation | The user’s permissions are determined through a call to the CF API controller with the given service and user’s access token. CF then tells the dashboard whether the user has read and manage permissions. |
Possible Action(s) | Verify that the account used to access the dashboard has permission to read/manage the service instance by verifying their
organization and space roles in CloudFoundry.
Note: Users with 'admin’ privileges still require an appropriate assigned role in the org and space to view and service instances and specifically require the SpaceDeveloper permission to manage the service instances. This is a limitation of VMware Tanzu/cloudfoundry UAA used in Single Sign On. If Single Sign On access for Solace PubSub+ was revoked through the Third Party Access pane on the VMware Tanzu User Management page, logout of the dashboard by clicking the user icon in the top right corner and clicking logout and retry accessing the dashboard through the dashboard url. |
Advanced Asset Management
In cases where a service or event broker can no longer be managed by regular managenent operations, the instance in question can be purged. This is done by accessing the Service Dashboard as an administrator at the solace-pubsub-broker-2.x.x app’s registered route. When a service is purged, its associated Cloud Foundry service and bindings are not deleted, and the underlying resources may or may not be freed. When an event broker is purged, the underlying deployment (either on demand or operator allocated) may not be deleted resulting in orphaned VMs requiring additional cleanup operations. Services or event brokers should only be purged as a last resort.