PubSub+ Insights Monitors for Datadog Reference

PubSub+ Insights uses Datadog monitors that are triggered to provide notifications when thresholds occur. Datadog monitors are part of the Datadog cloud application used by Solace as a central monitoring service to monitor event broker services (and event brokers), as well send notifications when certain events occur. The monitors available can observe either specific events found in logs, metrics (statistics from an event broker service), or both (derived).

Monitors can be configured to trigger notifications that include Alerts, Warnings, and Information (severity types). The severity triggered depends on how the monitor is configured. Not all monitors are configured to trigger notifications of all severity types, and in some cases, do no trigger any notifications at all. Notifications that are triggered can be based on specific events that are being monitored for in the logs, the metrics (statistics collected from event broker services), or a combination of both. The Alert and Warning notifications can be configured to be followed by an informational recovery notification when the monitor state is back to normal.

There are three user roles that can be assigned to users with an Insights subscription: The Insights Advanced Manager, Insights Advanced Viewer and Insights Advanced Editor.

  • The Insights Advanced Viewer user role provides the ability to see status views of the various PubSub+ Insights monitors.

  • The Insights Advanced Editor user role provides the ability to view and manage the PubSub+ Insights monitors using the Datadog Web Application.

  • The Insights Advanced Manager user role provides the ability to view and manage the PubSub+ Insights monitors using the Datadog Web Application. The Insights Advanced Manager can also manage Datadog API and APP keys, and Datadog Integrations. For more information, see Insights Advanced Manager Role.

The screenshot below shows an example of how a monitor status and history may appear in Datadog.

For more general information about PubSub+ Insights monitors see Datadog's help about Getting started with Monitors, and for more detailed information see their help about Alerting.

Note that a sample template monitor is provided so that you can create monitors to meet your specific needs. The logs you can use in this monitor are listed in a separate table following the table with monitor information, for more information see Event log descriptions .

Below you will find the following:

  • Monitor descriptions—Descriptions of the available PubSub+ Insights monitors.

  • Event log descriptions —Descriptions of event logs used in some of the PubSub+ Insights monitors.

  • System Logs —Information about, and links to the system logs that you can use to create your own monitors.

Monitor descriptions

The table below lists the monitors available for you to use in your dashboards and include the following categories in the table:

For the monitors listed in the table below, here's the information summarized:

Monitor Name
The name of the monitor. This is the name you see in Datadog.
Default Severity Configured
The default severity levels based on the monitor. Each monitor can be one of the following severity levels:
  • Alert — The monitor is configured to only trigger when an Alert occurs.
  • Warning — The monitor is configured to only trigger when a Warning occurs.
  • Alert or Warning — The monitor is configured to trigger when a threshold is exceeded. The configured value is a measure of utilization (as a percentage) of the available capacity. The severity is based on these default thresholds:

    • Warning: 80%

    • Alert (Critical): 95%

    Note that the levels are sample levels in the monitors we provide. If you clone the monitor, you can customize them these levels to meet your requirements. For example, if you could set your Alert to be 90% in your cloned monitor.

  • Recovery — This monitor is triggered to indicate that it has recovered from a previous Alert or Warning.

  • No Default Severity — This monitor does not trigger by default. To use this monitor, clone it and then set thresholds for warnings and alerts that match your monitoring requirements.

Details
 A description of the monitor, the predicted impact if an Alert or Warning is raised, and recommended actions. Included with the details are predicted impact and recommended actions.
Muted
Some monitors are muted (or silenced). You can choose to unmute them in the Datadog interface for your estate. For information about muting monitors see Muting PubSub+ Insights Monitors for Datadog.

 

Monitor Name Monitor Type Status Default Severity Configured Details
System Status

Bridge Down Observed

Status Active

Warning

The Message VPN Bridge (bridge) is down.

Predicted Impact: If the bridge remains down, messages can no longer be transmitted between the bridged event broker services or event brokers.

Recommended Actions: Verify that your service bridge configuration and network connectivity of the bridged event broker services are correct. For mitigation recommendations, see VPN_BRIDGING_LINK_DOWN in  the SolacePubSub+ Syslog Events Reference or Contact Solace for assistance if required.

Bridge Down Sustained

Status Active

Alert

The bridge has been down for a sustained period.

Predicted Impact: If the Bridge remains down, messages can no longer be transmitted between the bridged event broker services.

Recommended Actions: Verify your service bridge configuration and network connectivity of the bridged event broker services. For mitigation recommendations, see VPN_BRIDGING_LINK_DOWN in  the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Cache Cluster Status Down Status Active Alert

All Cache instances in the cluster are DOWN.

Predicted Impact: Messages for this cluster are being lost and cache lookups are unavailable.

Recommended actions: Recover the affected cache instance so that it's connected and operational. For mitigation recommendations, see the related VPN_SOLCACHE_* logs in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Cache Instance Memory - Out of Sync Metric Active Warning

The cache instance memory utilization is not synchronized.

Predicted Impact: There is an elevated risk of message loss until the cache instance is recovered.

Recommended actions: Check the memory utilization for each cache instance within the cache cluster. If the memory utilization difference persists, then restart the cache instance which has been identified as out of sync. For mitigation recommendations, see the related VPN_SOLCACHE_* logs in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Cache Instance Status Down Observed

Status Active

Warning

The cache instance is down.

Predicted Impact: There is an elevated risk of message loss until the cache instance is recovered.

Recommended Actions: Recover the affected cache instance so that it's connected and operational. For mitigation recommendations, see the related VPN_SOLCACHE_* logs in SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Cache Instance Status Down Sustained Status Active

Alert

Cache instance has been down for a sustained period.

Predicted Impact: There is an elevated risk of message loss until the cache instance is recovered.

Recommended Actions: Recover the affected cache instance so that it's connected and operational. For mitigation recommendations, see the related VPN_SOLCACHE_* logs in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Guaranteed Messaging Standby Status Down Sustained Status Active

Alert

Persistent messaging (Guaranteed Messaging) redundancy has decreased for a sustained period.

Predicted Impact: There is an elevated risk for a service outage.

Recommended Actions: For mitigation recommendations, see the SYSTEM_AD_* logs in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Guaranteed Messaging Service Interruption Observed Status Active

Alert

The event broker service's message spool disk has not been in the AD_Active state for the last minute. This means the message spool disk has not been bound to the primary event broker for the last minute. This may be due to normal event broker service activity.

Predicted Impact: There is an elevated risk that applications may not be able to send or receive Guaranteed Messages.

Recommended Actions:This alert is sent for informational purposes and no action is required. If the situation degrades, the Guaranteed Message Service Interruption Sustained monitor will trigger an alert, and you can take the actions described for that monitor.

Dead message queue unbound and with messages Metric Active

No Default Severity

A dead message queue (DMQ) has messages but no clients are connected to process them.

Predicted Impact: There is an elevated risk that if the queue fills up, messages are rejected.

Recommended Actions: Verify the queue connections status in Cluster Manager from the PubSub+ Cloud Console.

Guaranteed Messaging Service Interruption Sustained Status Active

Alert

The event broker service's message spool disk has not been in the AD_Active state for the last five minutes. This means the message spool disk has not been bound to the primary event broker for the last five minutes. The event broker service may be experiencing a Guaranteed Messaging service interruption.

Predicted Impact: There is an elevated risk that applications may not be able to send or receive Guaranteed Messages.

Recommended Actions: For mitigation recommendations, see the logs in the SolacePubSub+ Syslog Events Reference. For more information, Contact Solace.

Guaranteed Messaging Standby Status Down Observed Status Active

Warning

Guaranteed messaging redundancy has decreased.

Predicted Impact: There is an elevated risk for a service outage.

Recommended Actions: For mitigation recommendations, see the SYSTEM_AD_* logs in the SolacePubSub+ Syslog Events Reference. For more information, Contact Solace.

Message queue unbound and with messages Metric Active

No Default Severity

A message queue has messages but no clients (bound) are connected to process them.

Predicted Impact: If the queue fills up, messages may be rejected.

Recommended Actions: Verify the queue connections status in Cluster Manager from the PubSub+ Cloud.

Topic endpoint unbound and with messages Metric Active

No Default Severity

A topic endpoint has messages but no clients are connected to process them.

Predicted Impact: If the topic endpoint fills up, messages may be rejected.

Recommended Actions: Verify the topic endpoint connection status in Cluster Manager from the PubSub+ Cloud.

System Message Spool - Unacknowledged Messages Metric Active

Alert or Warning

System Message Spool unacknowledged message utilization threshold breach. The number of delivered but not yet acknowledged Guaranteed Messages exceed the monitor thresholds that are set.

Predicted Impact: New Guaranteed Messages may be rejected.

Recommended Actions: Identify the Guaranteed Messaging flows with large number of unacknowledged messages and disconnect the associated clients.

System Message Spool - Message Count Metric Active

Alert or Warning

A Message Spool Message Count utilization threshold breach.

Predicted Impact: If this resource is 100% of capacity (resources exhausted), no new Guaranteed Messages will be accepted by the event broker.

Recommended Actions: Check that the endpoints with the most messages enqueued. Contact the support or development team responsible for the client applications that consume messages form these endpoints.

Topic Endpoint
Message VPN Topic Endpoint - Storage Metric Inactive

Alert or Warning

Queue spool storage utilization threshold breach. The aggregate size of all messages enqueued to a specific queue endpoint exceeds the monitor thresholds. New messages published to this queue may be discarded by the endpoint. 

Predicted Impact: If the Message VPN topic endpoint storage is utilized to 100% of capacity (exhausted resources), any new messages published to the specified topic may be discarded by the endpoint. If configured, the publisher is notified about the rejection (reject-message-to-sender-on-discard).

Recommended Actions: Reduce spool usage as necessary to ease demand or increase the limit on the topic endpoint.

Message VPN
Message VPN Connections - AMQP Metric Active

Alert or Warning

Message VPN AMQP connection utilization threshold breach. AMQP connection utilization exceeds the monitor thresholds.

Predicted Impact: If the number of AMQP connections reaches the maximum number of connections (100% of capacity), new AMQP connections will be rejected.

Recommended Actions: Scale up the class of the messaging service to allow more connections. Otherwise, if possible, contact the support or development team of the client applications connected to the event broker service to verify if the number of connections being used by those applications is reasonable.

Message Replication Status Sync Ineligible Rejecting Messages Sustained Status Active

Alert

For a sustained period, messages can no longer be delivered to the standby site as fast as they are being received by the active site.

Predicted Impact: All messages that require synchronous replication are downgraded to asynchronous replication.

Recommended Actions: Verify the replication configuration for the event broker service. For mitigation recommendations, see the VPN_REPLICATION_* logs in the SolacePubSub+ Syslog Events Reference. For more information, contact Solace.

Message VPN - Message Spool - Ingress Flow Metric Active

Alert or Warning

Message spool Ingress Flow utilization threshold breach.

Predicted Impact: If the ingress flow is utilized to 100% capacity (resources are exhausted), any publishers may not be able to send new Guaranteed Messages.

Recommended Actions: Remove any unused ingress flows from the Message VPN, otherwise reduce ingress flow use to ease demand. Disconnect some publishers that are not needed.

Message VPN - Message Spool - Storage Metric Inactive

Alert or Warning

This is actively monitored with the PubSub+ Insights Monitors for Datadog Reference monitor, which triggers notifications based on the thresholds set in the event broker service. The use of the PubSub+ Insights Monitors for Datadog Reference monitor provides more flexibility for different thresholds and allows every message spool to be monitored based on individual thresholds.

This monitor is not active and is available for use as a template or example. You can choose to clone the monitor and make it active to monitor message spools as a metric-based monitor.

The message spool utilization threshold for the Message VPN has been reached.

Predicted Impact: If the Message Spool is utilized to 100% of capacity (resources are exhausted), all Guaranteed (persistent) Messages are rejected on the affected event broker service.

Recommended Actions: Reduce spool usage as necessary to ease demand or scale up the class of the event broker service. Increase the message consumption rate or reduce the message publishing rate for Guaranteed Messaging. Contact Solace for assistance if necessary.

Message VPN - Message Spool - Endpoint Metric Active

Alert or Warning

Message Spool Endpoint utilization breach. Endpoint utilization within a Message VPN exceeded the monitor thresholds. It will not be possible to create queues or topics.

Predicted Impact: If the endpoint utilization within the Message VPN's message spool endpoint is utilized to 100% of capacity (resources are exhausted), queues or topic endpoints cannot be created.

Recommended Actions: Remove any unused endpoints from the Message VPN. Otherwise, reduce endpoint use to ease demand.

Message VPN Connections - MQTT Metric Active

Alert or Warning

Message VPN MQTT connection utilization threshold breach. MQTT connection utilization exceeds the monitor thresholds.

Predicted Impact: If the number of MQTT connections reaches the maximum number of connections (100% of capacity), new MQTT connections are rejected.

Recommended Actions: Scale up the class of the event broker service to allow more connections. Otherwise, if possible, contact the support or development team of the client applications connected to the event broker service to verify if the number of connections being used by those applications is reasonable.

Message VPN - Message Spool - Transacted Session Metric Active

Alert or Warning

Message Spool Transacted Session utilization breach. Transacted session utilization exceeds the monitor thresholds. 

Predicted Impact: If the Message Spool transacted session is utilized to 100% of capacity (resources are exhausted), new transacted sessions (or transacted connections to the event broker) cannot be created until the current sessions (connections) are closed.

Recommended Actions: Contact the support or development team for the client applications that establish the transacted sessions. That team should be consulted to ensure transacted sessions are used efficiently. Alternatively, reduce the transacted sessions to ease demand. Update applications to more efficiently close transacted sessions or otherwise reduce transacted session use to ease demand. Alternatively consider upscaling of the event broker service.

Message VPN - Message Spool - Egress Flow Metric Active

Alert or Warning

Message spool egress flow utilization threshold breach.

Predicted Impact: If the egress flow is utilized to 100% of capacity (resources are exhausted) for a queue or topic endpoint, consumers will not be able to bind or initiate a flow of messages from the persistent endpoint.

Recommended Actions: Remove any unused egress flows from the Message VPN, otherwise reduce the egress flow use to ease demand. Disconnect some consumer applications that are using Guaranteed Messaging that are not needed or upscale the event broker service. Otherwise reduce egress flow use to ease demand.

Message VPN Connections - REST Incoming Metric Active

Alert or Warning

Message VPN REST incoming connection utilization threshold breach. Incoming REST connection utilization exceeds the monitor thresholds.

Predicted Impact: If the number of incoming REST connections reaches the maximum number of connections (100% of capacity, new incoming REST connections will be rejected.

Recommended Actions: Upscale the class of the event broker service to allow more connections.

Message Replication Status Sync Ineligible Rejecting Messages Observed Status Active

Warning

Messages can no longer be delivered to the standby site as fast as they are being received by the active site.

Predicted Impact: All messages that require synchronous replication are downgraded to asynchronous replication.

Recommended Actions: Verify the stability of the connection to the standby site and ensure there is sufficient bandwidth. For mitigation recommendations, see the SYSTEM_CFGSYNC_ logs in the SolacePubSub+ Syslog Events Reference or Contact Solace for more information.

Message VPN Connections - Total Metric Active

Alert or Warning

Message VPN total connection utilization threshold breach. Total connection utilization exceeds the monitor thresholds. If the number of connections reaches total max connections, new connections of any kind may be rejected.

Predicted Impact: If the number of connections reaches the maximum number of connections (100% of capacity), new connections of any kind are rejected.

Recommended Actions: Upscale the class of the event broker service to allow more connections.
If possible, contact the support or development team for the client applications that are connected to the event broker to verify that the number of connections being used by the client applications is reasonable.

Message VPN Connections - Web Metric Active

Alert or Warning

Message VPN Web connection utilization threshold breach. Web connection utilization exceeds the monitor thresholds. If the number of Web connections reaches maximum available, new Web connections may be rejected.

Predicted Impact: If the number of Web connections reaches the maximum number of connections (100% of capacity), new Web connections will be rejected.

Recommended Actions: Upscale the event broker service to allow more connections. Otherwise, if possible, contact the support or development team of the client applications connected to the event broker service to verify if the number of connections being used by those applications is reasonable.

Message VPN Subscriptions Metric Active

Alert or Warning

Message VPN subscription utilization threshold breach. Subscriptions utilization exceeds the monitor thresholds. Consumers may no longer be able to subscribe to additional topics.

Predicted Impact: If the Message VPN spool subscription is utilized to 100% of capacity (resources are exhausted), consumers cannot add subscribe to additional topics.

Recommended Actions: Upscale the class of the event broker service to allow more subscriptions. Otherwise determine which clients are most responsible for the large subscription count. Follow up with the support or development team of those client applications as necessary.

Message Replication Status Sync Ineligible Downgrading Messages Observed Status Active

Warning

Messages can no longer be delivered to the standby site as fast as they are being received by the active site.

Predicted Impact: All messages that require synchronous replication are downgraded to asynchronous replication.

Recommended Actions: Verify the stability of the connection to the standby site and ensure there is sufficient bandwidth. For mitigation recommendations, consult the related syslog event documentation or for more information, Contact Solace.

Message Replication Status Down Sustained Status Active

Alert

Message VPN replication status has been down for a sustained period.

Predicted Impact: If the replication status remains down, messages and configuration may not be correctly replicated to the configured Disaster Recovery (DR) site.

Recommended Actions: Verify the event broker service's replication configuration. For mitigation recommendations, consult the related syslog event documentation or for more information, Contact Solace.

Message Replication Status Down Observed Status Active

Warning

Message VPN replication status is down.

Predicted Impact: If the replication status remains down, messages and configuration may not be correctly replicated to the configured Disaster Recovery (DR) site.

Recommended Actions: Verify the event broker service's replication configuration. For mitigation recommendations, consult the related syslog event documentation or for more information, Contact Solace.

Message Replication Down or Degraded Status Active

Alert

The Message VPN replication status is down or degraded in a Disaster Recovery configuration (DR).

Predicted Impact: There is an elevated risk that messages and configuration may not be correctly replicated to the configured Disaster Recovery (DR) site. There is also a risk that the DR site's persistent message consistency with the Standby site has been compromised.

Recommended Actions: Check that the Disaster Recovery configuration is correct. For more information, see Addressing Degraded Service in the Solace documentation or Contact Solace.

Message VPN Connections - SMF Metric Active

Alert or Warning

Message VPN SMF connection utilization threshold breach. SMF connection utilization exceeds the monitor thresholds. If the number of SMF connections reaches maximum available, new SMF connections may be rejected.

Predicted Impact: If the number of SMF connections reaches the maximum number of connections (100% of capacity), new SMF connections are rejected.

Recommended Actions: Upscale the class of the messaging service to allow more connections. Otherwise, if possible, contact the support or development team of the client applications connected to the event message service to verify if the number of connections being used by those applications is reasonable.

Message VPN Connections - REST Outgoing Metric Active

Alert or Warning

Message VPN REST outgoing connection utilization threshold breach. Outgoing REST connection utilization exceeds the monitor thresholds.

Predicted Impact: If the number of outgoing REST connections reaches the maximum number of connections (100% of capacity), new outgoing REST connections can no longer be initiated by the event broker service.

Recommended Actions: Upscale the class of the event broker service to allow more connections.

Message VPN - Message Spool - Transaction Metric Active

Alert or Warning

Message spool Transaction utilization threshold breach. The transaction utilization exceeds the monitor thresholds.

Predicted Impact: New transactions may not be possible to be initiated in a transacted session until current transactions are committed.

Recommended Actions: Update applications to more efficiently close transactions or otherwise reduce transaction use to ease demand. Alternatively, upscale the class of the event broker service.

Cache Instance
Message VPN Cache Instance - Topics Metric Active

Alert or Warning

Cache topics utilization threshold breach. The number of unique topics cached by a cache instance exceeds the monitor thresholds.

Predicted Impact: If the Message VPN Cache Instance for topics is utilized to 100% of capacity (resources are exhausted), new unique topics cannot be cached by the Cache Instance.

Recommended Actions: The maximum number of cached topics allowed can be increased if the large number of topics is expected; otherwise, you can do one of the following to reduce the number of topics cached by a given Cache Instance:

  1. Reduce the number of topic subscriptions configured in the cache cluster.
  2. Subdivide the topic subscriptions and distribute them between multiple clusters.
Message VPN Cache Instance - Request Queue Metric Active

Alert or Warning

Cache request queue utilization threshold breach. The number of cache requests enqueued to a cache instance exceeds the monitor thresholds.

Predicted Impact: If the Message VPN Cache Instance's request queue is utilized to 100% of capacity (resources are exhausted), the cache instance may reject new lookup requests.

Recommended Actions: The maximum queue depth can be increased if large bursts of cache requests are expected and the request client applications can tolerate latencies in responding to the requests. Otherwise, reduce the request rate to the Cache Instance from the client applications. Contact Solace for assistance if necessary.

Message VPN Cache Instance - Memory Metric Active

Alert or Warning

Cache memory utilization threshold breach. Memory utilization by the cache instance exceeds the monitor thresholds. This may negatively affect cache operation.

Predicted Impact: If the Message VPN Cache Instance memory is utilized to 100% of capacity (resources are exhausted). it may affect cache operation and new messages may be rejected or if configured, the cache instance transition to the Down state.

Recommended Actions: The maximum amount of memory allowed for a Cache Instance can be increased if the large memory usage is expected and more is available on the host system on which the cache instance is running. Otherwise you can do one of the following to reduce the memory used by the Cache Instance:

  1. Reduce the maximum number of individual topics cached.
  2. Reduce the number of messages cached per individual topic.
  3. Reduce the number of topics cached and/or redistribute the cached topic space amongst other cache clusters to ease the request rate experienced by this cache instance.
Message VPN Cache Instance - CPU Metric Active

Alert or Warning

Cache CPU utilization threshold breach. CPU utilization exceeds the monitor thresholds.

Predicted Impact: If the Message VPN Cache Instance CPU is utilized to 100% of capacity (resources are exhausted), the cache throughput of the instance may affected. The cache host reached its maximum processing capacity.

Recommended Actions: Contact Solace for assistance if necessary.

Cache Instance Status Lost Message Metric Active

Alert

Cache CPU utilization threshold breach. CPU utilization exceeds the monitor thresholds. The cache throughput of the instance may affected. The Cache host may be reaching its maximum processing capacity.

Predicted Impact: If the Message VPN Cache Instance CPU is utilized to 100% of capacity (resources are exhausted), the cache throughput of the instance may affected. The cache host reached its maximum processing capacity.

Recommended Actions: Contact Solace for assistance if necessary.

Client Username
Message VPN Client Username - Connections - SMF Metric Active

Alert or Warning

Client username SMF connection utilization threshold breach. The number of SMF client connections associated with specified client-username exceeds the monitor thresholds. 

Predicted Impact: If the Message VPN total SMF connections via a client username reach its limit, new clients cannot connect using the specified client username using SMF connections.

Recommended Actions: The thresholds can be increased if a large number of clients connected to the event broker service is expected. Otherwise, contact the support or development team of the client applications connected to the event broker service to verify if the number of connections being used by those applications is reasonable.

Message VPN Client Username - Connections - Web Metric Active

Alert or Warning

Client username Web connection utilization threshold breach. The number of Web client connections associated with specified client-username exceeds the monitor thresholds.  S

Predicted Impact: If the Message VPN total Web connections via a client reach its limit, new clients cannot connect using the specified client-username using Web connections.

Recommended Actions: The thresholds can be increased if a large number of clients connected to the event broker service is expected. Otherwise, contact the support or development team of the client applications connected to the event broker service to verify if the number of connections being used by those applications is reasonable.

Message VPN Client Username - Connections - Total Metric Active

Alert or Warning

Client username total connection utilization threshold breach. The number of client connections associated with specified client username exceeds the monitor thresholds. 

Predicted Impact: If the Message VPN total connections via a client username reach its limit, new clients cannot connect using the specified client username using any type of connection.

Recommended Actions: The thresholds can be increased if a large number of clients connected to the event broker service is expected. Otherwise, if possible, contact the support or development team of the client applications connected to the event broker service to verify if the number of connections being used by those applications is reasonable.

Message VPN Client Username - Message Spool - Endpoint Metric Active

Alert or Warning

Client username message spool endpoint utilization threshold breach. The number of endpoints associated with a client-username exceeds monitor thresholds. 

Predicted Impact: If the Message VPN total endpoints via a client username reach its limit, any clients associated with this client username cannot create persistent endpoints.

Recommended Actions: Remove any unused endpoints associated with the client-username. Otherwise reduce endpoint use to ease demand.

Queue
Message VPN Queue - Message Spool - DMR Queue Metric Active Alert or Warning

Queue spool storage utilization threshold breach for a DMR Queue. The aggregate size of all messages enqueued to a specific queue endpoint exceeds the monitor thresholds.

Predicted Impact: The endpoint may discard new messages published to the queue. If configured, the publisher may be notified about the rejection (reject-message-to-sender-on-discard).

Recommended Actions: Investigate this queue to confirm that the required consumers are configured correctly and consuming at the desired rate. For mitigation recommendations, see the VPN_AD_MSG_SPOOL_HIGH and VPN_AD_MSG_SPOOL_QUOTA_EXCEED logs in the Solace PubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Message VPN Queue - Message Spool - Queue Metric Active Alert or Warning

Queue spool storage utilization threshold breach for Telemetry Queue and Non-Reserved Queue (configurable queues which are not used internally by the Solace broker). The aggregate size of all messages enqueued to a specific queue endpoint exceeds the monitor thresholds.

Predicted Impact: The endpoint may discard new messages published to the queue. If configured, the publisher may be notified about the rejection (reject-message-to-sender-on-discard).

Recommended Actions: Investigate this queue to confirm that the required consumers are configured correctly and consuming at the desired rate. For mitigation recommendations, see the VPN_AD_MSG_SPOOL_HIGH and VPN_AD_MSG_SPOOL_QUOTA_EXCEED logs in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Message VPN Queue - Message Spool - Replication Queue Metric Active Alert or Warning

Queue spool storage utilization threshold breach for Replication Queue. The aggregate size of all messages enqueued to a specific queue endpoint exceeds the monitor thresholds.

Predicted Impact: The endpoint may discard new messages published to the queue. If configured, the publisher may be notified about the rejection (reject-message-to-sender-on-discard).

Recommended Actions: Investigate this queue to confirm that the required consumers are configured correctly and consuming at the desired rate. For mitigation recommendations, see the VPN_AD_MSG_SPOOL_HIGH and VPN_AD_MSG_SPOOL_QUOTA_EXCEED logs in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Event Broker Logs

System Log Alert: VPN_AD_MSG_SPOOL_QUOTA_EXCEED Log Active Alert

The Message VPN, queue, or endpoint message spool limit has been breached or has indicated that it has exceeded its spool size limit. No messages can be added to the message spool.

In the Alert, it indicates whether it is queue or topic endpoint that was breached.

Impacts, Recommended Actions, and Message Formatting:See VPN_AD_MSG_SPOOL_QUOTA_EXCEED in the SolacePubSub+ Syslog Events Reference

System Log Alert: VPN_AD_MSG_SPOOL_HIGH Log Active Warning

The Message VPN, queue, topic endpoint message spool threshold has been reached. In the email are the details of the name of the queue, topic endpoint, and the message spool that experienced the problem. This log-based monitor provides more flexibility that allows you to monitor different thresholds and allows every topic endpoint to be monitored based on individual thresholds.

Impacts, Recommended Actions, and Message Formatting: See VPN_AD_MSG_SPOOL_HIGH in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_AD_MSG_SPOOL_HIGH_CLEAR Log Active Recovery

The Message Spool threshold breach has been cleared. This can be for a  Message VPN, a Queue, or Topic Endpoint that initially caused the Message Spool threshold breach that triggered the System Log: VPN_AD_MSG_SPOOL_HIGH monitor.

Impacts, Recommended Actions, and Message Formatting:See VPN_AD_MSG_SPOOL _HIGH_CLEAR in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_CLIENT_USERNAME_CONNECTIONS_EXCEEDED Log Active Alert

A client username for an event broker service has exceeded its connection limit. Any existing connections remain intact, however, no new connections can be made.

Impacts, Recommended Actions, and Message Formatting:See VPN_CLIENT_USERNAME_CONNECTIONS_EXCEEDED in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_CLIENT_USERNAME_CONNECTIONS_HIGH Log Active Warning

A client username for an event broker service is close to its connection limit. If the limit is reached, any existing connections remain intact, however, no new connections can be made.

Impacts, Recommended Actions, and Message Formatting:See VPN_CLIENT_USERNAME_CONNECTIONS_HIGH in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_CLIENT_USERNAME_CONNECTIONS_HIGH_CLEAR Log Active Recovery

A client username for an event broker service has recovered from a high-connection limit. New connections now can be made using the client username.

Impacts, Recommended Actions, and Message Formatting:See VPN_CLIENT_USERNAME_CONNECTIONS_HIGH_CLEAR in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_SOLCACHE_SUBSCRIBE_FAIL Log Active

Alert

 

The SolCache failed to add a subscription.

Impacts, Recommended Actions, and Message Formatting: See VPN_SOLCACHE_SUBSCRIBE_FAIL in the SolacePubSub+Syslog Events Reference.

System Log Alert: VPN_SOLCACHE_CONFIG_SYNC_FAIL Log Active

Alert

 

The SolCache config sync has failed.

Impacts, Recommended Actions, and Message Formatting:See VPN_SOLCACHE_CONFIG_SYNC_FAIL in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_SOLCACHE_CLUSTER_SYNC_FAIL Log Active

Alert

 

The SolCache cluster config sync has failed.

Impacts, Recommended Actions, and Message Formatting: See VPN_SOLCACHE_CLUSTER_SYNC_FAIL in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_BRIDGING_SUBSCRIPTION_ADD_FAILED Log Active

Alert

 

The Message VPN bridge failed to add a subscription.

Impacts, Recommended Actions, and Message Formatting: See VPN_BRIDGING_SUBSCRIPTION_ADD_FAILED in the SolacePubSub+ Syslog Events Reference

System Log Alert: VPN_BRIDGING_BRIDGE_STALLED Log Active

Alert

The Message VPN bridge has stalled.

Impacts, Recommended Actions, and Message Formatting: See VPN_BRIDGING_BRIDGE_STALLED in the SolacePubSub+ Syslog Events Reference.

System Log Alert: VPN_AD_BIND_COUNT_HIGH Log Active

Warning

The persistent endpoint bind count threshold has been breached.

Impacts, Recommended Actions, and Message Formatting:See VPN_AD_BIND_COUNT_HIGH in the SolacePubSub+ Syslog Events Reference.

Kafka Metric Monitors

Kafka Bridge Receiver Down Observed

Metric

  Alert

The Kafka bridge receiver was observed in a down state.

Predicted Impact: If the Kafka bridge receiver remains down, messages can no longer be transmitted between your Solace and Kafka brokers.

Recommended Actions: Verify your event broker service Kafka bridge configuration and the network connectivity of the bridged messaging services. For mitigation recommendations, see VPN_KAFKA_RECEIVER_DOWN in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Kafka Bridge Receiver Down Sustained

Metric

  Alert

The Kafka bridge receiver has been down for a sustained period.

Predicted Impact: If the Kafka bridge sender remains down, messages can no longer be transmitted between your Solace event broker services and Kafka brokers.

Recommended Actions: Verify your event broker service Kafka bridge configuration and the network connectivity of the bridged messaging services. For mitigation recommendations, see VPN_KAFKA_RECEIVER_DOWN in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Kafka Bridge Sender Down Observed

Metric   Alert

The Kafka bridge sender was observed in a down state.

Predicted Impact: If the Kafka bridge sender remains down, messages can no longer be transmitted between your Solace event broker services and Kafka brokers.

Recommended Actions: Verify your event broker service Kafka bridge configuration and the network connectivity of the bridged messaging services. For mitigation recommendations, see VPN_KAFKA_SENDER_DOWN in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Kafka Bridge Sender Down Sustained

Metric   Alert

The Kafka bridge sender has been down for a sustained period.

Predicted Impact: If the Kafka bridge sender remains down, messages can no longer be transmitted between your Solace event broker services and Kafka brokers.

Recommended Actions: Verify your event broker service Kafka bridge configuration and the network connectivity of the bridged messaging services. For mitigation recommendations, see VPN_KAFKA_SENDER_DOWN in the SolacePubSub+ Syslog Events Reference. Contact Solace for assistance if required.

Sample Template Monitor
Sample template for broker event log monitors Template    

This is a generic sample template intended for creating your own monitors.

The sample template monitor is intended for use with the various event and system logs available to you. A complete list of:

To create your own monitors, clone this template, and configure it accordingly. See Cloning and Customizing the Template Monitor .

Event log descriptions

The table below lists the various event logs available for you to use in with the sample template monitor listed in the table above. The logs in the table have been organized into the following categories:

The table below provides the following information about each log type:

Log Name
The name of the log. This is what you will enter into the first field of section one when cloning and creating a monitor, see Cloning and Customizing the Template Monitor .
Default Severity Configured
The default severity levels are different for each log, and can be either Warn, Notice, or Info.
Description
A description of the log, and links to the links to further technical information about each.

You can also use the system logs that are available in your Datadog account when configuring custom monitors. See Cloning and Customizing the Template Monitor .

Log Name Event Level Severity Description
Client Event Logs
CLIENT_CLIENT_BIND_FAILED Warn

This event is sent when a client fails to bind to a message endpoint.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_BIND_FAILED in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_CREATE_ENDPOINT_FAILED Notice

This event is sent when a message endpoint fails to be created.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_CREATE_ENDPOINT_FAILED in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_LARGE_MESSAGE Notice

This is a one-shot event sent when the size of an ingress message is larger than that configured through the CLI using the 'message-vpn <> event large-message-threshold' Config CLI command.

The message is still processed by the message broker along with the generation of this event.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_LARGE_MESSAGE in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_MESSAGE_TOO_BIG Warn

This is a one-shot event sent when a direct or guaranteed message size exceeds what is allowed by the message broker. This event is only raised for clients using the SMF protocol.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_MESSAGE_TOO_BIG in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_SUBSCRIPTIONS_HIGH Warn

This event is sent when the number of subscriptions, for the specified client, rises above the set threshold value.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_SUBSCRIPTIONS_HIGH in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_EGRESS_FLOWS_HIGH Warn

This event is sent when the number of egress flows for a client reaches or exceeds the set threshold specified by the associated client profile for that client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_EGRESS_FLOWS_HIGH in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_EGRESS_FLOWS_HIGH_CLEAR Info

This event is sent when the number of egress flows, for a client, falls below the clear threshold specified by the associated client profile for that client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_EGRESS_FLOWS_HIGH_CLEAR in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_INGRESS_FLOWS_HIGH Warn

This event is sent when the number of ingress flows, for a client, reaches or exceeds the set threshold specified by the associated client profile for that client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_INGRESS_FLOWS_HIGH in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_INGRESS_FLOWS_HIGH_CLEAR Info

This event is sent when the number of egress flows, for a client, falls below the clear threshold specified by the associated client profile for that client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_EGRESS_FLOWS_HIGH_CLEAR in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_MAX_EGRESS_FLOWS_EXCEEDED Warn

This event is sent when the number of egress flows for a client exceeds the maximum number permitted for the client's associated profile.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_MAX_EGRESS_FLOWS_EXCEEDED in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_MAX_INGRESS_FLOWS_EXCEEDED Warn

This event is sent when the number of egress flows for a client exceeds the maximum number permitted for the client's associated profile.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_MAX_INGRESS_FLOWS_EXCEEDED in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_TRANSACTED_SESSIONS_EXCEED Warn

This event is sent when the number of transacted sessions, for the specified client, exceeds the total available by the associated client profile for that client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_TRANSACTED_SESSIONS_EXCEED in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_TRANSACTED_SESSIONS_HIGH Warn

This event is sent when the number of transacted sessions, for the specified client, reaches or exceeds the set threshold value.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_TRANSACTED_SESSIONS_HIGH in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_TRANSACTED_SESSIONS_HIGH_CLEAR Info

This event is sent when the number of transacted sessions, for the specified client, falls below the clear threshold value.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_TRANSACTED_SESSIONS_HIGH_CLEAR in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_TRANSACTED_SESSION_FAIL Notice

This event is sent when a transacted client session fails locally.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_TRANSACTED_SESSION_FAIL in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_TRANSACTIONS_EXCEED Warn

This event is sent when the number of transactions, for the specified client, exceeds the total available by the associated client profile for that client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_TRANSACTIONS_EXCEED in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_TRANSACTIONS_HIGH Warn

This event is sent when the number of transactions, for the specified client, reaches or exceeds the set threshold value.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_TRANSACTIONS_HIGH in the SolacePubSub+ Syslog Events Reference.

CLIENT_AD_TRANSACTIONS_HIGH_CLEAR Info

This event is sent when the number of transactions, for the specified client, falls below the clear threshold value.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_AD_TRANSACTIONS_HIGH_CLEAR in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_ACK_NOT_ALLOWED Notice

This event is sent when the client does not have the privileges required to consume messages from an endpoint.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_ACK_NOT_ALLOWED in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_EGRESS_MSG_DISCARD Info

This event is generated when a message is discarded because it cannot be sent to a client. However, it is only sent out approximately once every 60 seconds for the specified client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_EGRESS_MSG_DISCARD in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_NAME_CHANGE_FAILED Notice

This event is generated when a message is discarded because it cannot be sent to a client. However, it is only sent out approximately once every 60 seconds for the specified client.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_NAME_CHANGE_FAILED in the SolacePubSub+ Syslog Events Reference.

CLIENT_CLIENT_PARSE_ERROR Notice

This is a one-shot event sent when the message broker encounters parsing errors while processing the Solace-specific messaging headers.

Impacts, Recommended Actions, and Message Formatting: See CLIENT_CLIENT_PARSE_ERROR in the SolacePubSub+ Syslog Events Reference.

System Event Logs
SYSTEM_CLIENT_CONNECT_FAIL Info

This event is sent when a client tries to connect to the message broker but the message broker is unable or unwilling to accept the connection.

Impacts, Recommended Actions, and Message Formatting: See SYSTEM_CLIENT_CONNECT_FAIL in the SolacePubSub+ Syslog Events Reference.

SYSTEM_AUTHENTICATION_SESSION_DENIED Notice

A user has unsuccessfully attempted to authenticate for a CLI, SEMP, shell, scp, or sftp session.

Impacts, Recommended Actions, and Message Formatting: See SYSTEM_AUTHENTICATION_SESSION_DENIED in the SolacePubSub+ Syslog Events Reference.

SYSTEM_CLIENT_ACL_CONNECT_DENIAL Info

One or more client connect attempts were denied in the last 60 seconds due to an Access Control List (ACL) client-connect rule.

Impacts, Recommended Actions, and Message Formatting: See SYSTEM_CLIENT_ACL_CONNECT_DENIAL in the SolacePubSub+ Syslog Events Reference.

SYSTEM_CLIENT_ACL_PUBLISH_DENIAL Info

One or more messages published within the last 60 seconds was rejected on ingress because the topic matched a publish-topic ACL rule.

Impacts, Recommended Actions, and Message Formatting: See SYSTEM_CLIENT_ACL_PUBLISH_DENIAL in the SolacePubSub+ Syslog Events Reference.

SYSTEM_CLIENT_ACL_SUBSCRIBE_DENIAL Info

One or more subscriptions, received from clients within the last 60 seconds, was rejected because the topic matched a subscribe-topic ACL rule.

Impacts, Recommended Actions, and Message Formatting: See SYSTEM_CLIENT_ACL_SUBSCRIBE_DENIAL in the SolacePubSub+ Syslog Events Reference.

SYSTEM_CLIENT_CONNECT_AUTH_FAIL Warn

This event is sent when user authentication fails for a client trying to connect to the message broker.

Impacts, Recommended Actions, and Message Formatting: See SYSTEM_CLIENT_CONNECT_AUTH_FAIL in the SolacePubSub+ Syslog Events Reference.

SYSTEM_CLIENT_CONNECT_FAIL Info

This event is sent when a client tries to connect to the message broker but the message broker is unable or unwilling to accept the connection.

Impacts, Recommended Actions, and Message Formatting: See SYSTEM_CLIENT_CONNECT_FAIL in the SolacePubSub+ Syslog Events Reference.

VPN Event Logs
VPN_AD_MSG_SPOOL_REJECT_LOW_PRIORITY_MSG_LIMIT_EXCEED Warn

This event is sent when the messages enqueued to a specific endpoint exceeds the configured reject-low-priority-msg-limit. No new low priority messages will be admitted to the endpoint.

Impacts, Recommended Actions, and Message Formatting: See VPN_AD_MSG_SPOOL_REJECT_LOW_PRIORITY_MSG_LIMIT_EXCEED in the SolacePubSub+ Syslog Events Reference.

VPN_AD_MSG_SPOOL_REJECT_LOW_PRIORITY_MSG_LIMIT_HIGH Warn

This event is sent when the messages enqueued to a specific endpoint exceeds the set threshold value of configured reject-low-priority-msg-limit.

Impacts, Recommended Actions, and Message Formatting: See VPN_AD_MSG_SPOOL_REJECT_LOW_PRIORITY_MSG_LIMIT_HIGH in the SolacePubSub+ Syslog Events Reference.

VPN_AD_MSG_SPOOL_REJECT_LOW_PRIORITY_MSG_LIMIT_HIGH_CLEAR Info

This event is sent when the messages enqueued to a specific endpoint falls below the clear threshold value of configured reject-low-priority-msg-limit.

Impacts, Recommended Actions, and Message Formatting: See VPN_AD_MSG_SPOOL_REJECT_LOW_PRIORITY_MSG_LIMIT_HIGH_CLEAR in the SolacePubSub+ Syslog Events Reference.

VPN_AD_REPLAY_STATE_TRANSITION_TO_FAILED Warn

This event is sent when a message replay transitions to the failed state.

Impacts, Recommended Actions, and Message Formatting: See VPN_AD_REPLAY_STATE_TRANSITION_TO_FAILED in the SolacePubSub+ Syslog Events Reference.

VPN_BRIDGING_LINK_TTL_EXCEEDED Warn

This is a one-shot event message sent when a message has been discarded due to it exceeding the allowed number of bridging hops.

Impacts, Recommended Actions, and Message Formatting: See VPN_BRIDGING_LINK_TTL_EXCEEDED in the SolacePubSub+ Syslog Events Reference.

VPN_BRIDGING_LINK_TTL_EXCEEDED_CLEAR Info

This event message is sent when the one-shot event message VPN_BRIDGING_TTL_EXCEEDED is cleared through the 'bridge <> message-vpn <> clear-event ttl-exceeded' Admin CLU command.

Impacts, Recommended Actions, and Message Formatting: See VPN_BRIDGING_LINK_TTL_EXCEEDED_CLEAR in the SolacePubSub+ Syslog Events Reference.

VPN_CLIENT_USERNAME_CONNECT_FAIL Info

This event message is sent when a client tries to connect to the message broker but the message broker is unable or unwilling to accept the connection.

Impacts, Recommended Actions, and Message Formatting: See VPN_CLIENT_USERNAME_CONNECT_FAIL in the SolacePubSub+ Syslog Events Reference.

VPN_RDP_RC_DOWN Warn

This event message is sent when a REST consumer transitions to the down state. This means all the individual TCP connections to the REST consumer's remote host are down. Thus, the RDP is no longer able to use this REST consumer to send messages to it. The RDP can continue to send messages through its other REST consumers that are up.

Impacts, Recommended Actions, and Message Formatting: See VPN_RDP_RC_DOWN in the SolacePubSub+ Syslog Events Reference.

VPN_RDP_RC_UP Info

This event message is sent when a REST consumer transitions to the up state. This means at least one TCP connection has been established with the REST consumer's remote host.

Impacts, Recommended Actions, and Message Formatting: See VPN_RDP_RC_UP in the SolacePubSub+ Syslog Events Reference.

VPN_RDP_RC_CONN_DOWN Warn

This event is sent when an individual TCP connection of a REST consumer link has failed. The REST consumer may have several such connections.

Impacts, Recommended Actions, and Message Formatting: See VPN_RDP_RC_CONN_DOWN in the SolacePubSub+ Syslog Events Reference.

VPN_RDP_RDP_DOWN Warn

This event message is sent when a REST Delivery Point transitions to the down state. This means all of the RDP's REST Consumers are down or all of the RDP's queue bindings are down.

Impacts, Recommended Actions, and Message Formatting: See VPN_RDP_RDP_DOWN in the SolacePubSub+ Syslog Events Reference.

VPN_RDP_RDP_UP Info

This event message is sent when a REST Delivery Point transitions to the up state. This means at least one of RDP's REST consumers is up and at least one of RDP's queue bindings is up.

Impacts, Recommended Actions, and Message Formatting: See VPN_RDP_RDP_UP in the SolacePubSub+ Syslog Events Reference.

System Logs

You can use the wide array of system logs that are collected and available in your Datadog with your Insights account to configure your own custom monitors. For a list of available system logs and information about them, see SolacePubSub+ Syslog Events Reference.

For information about creating your own custom monitors, see Cloning and Customizing the Template Monitor .