PubSub+ Insights Overview

PubSub+ Insights is a monitoring service that you subscribe to and if your account is enabled with Insights, you can access it from the PubSub+ Cloud Console. To get access, contact Solace support. Insights provides a single entry point to see historical and real-time metrics for monitoring your event broker services to allow you to better manage them. Insights collects metrics to allow you to:

  • monitor the health and stability of your Message VPNs, endpoints, clients, and VPN Bridges
  • analyze capacity and bottlenecks so that you better manage your services and perform capacity management
  • proactively monitor your services to minimize downtime
  • build an understanding of application behavior

Insights takes care of collecting metrics to build visualizations and useful information for you so that you can monitor your system. Alternatively, you can also use Syslog  Forwarding to send command/event logs if you have your own monitoring system, but if you have Insights enabled, you may find the visualizations and notification emails complementary to your existing monitoring system. For more information about forwarding logs from your event broker service, see Forwarding Logs to an External System.

Understanding PubSub+ Insights

When have you Insights enabled, the Monitoring tab is populated with dashboard and historical information; if it isn't enabled, contact Solace support to get access. With Insights, you are able to see summarized information. With Insights, you can use a notification service that sends notification emails to the email account configured in your PubSub+ Cloud account based on metric thresholds. As part of the summary information, there are many useful visualizations. One useful visualization shows the overall health and status of your service as follows:

  • OK — There has not been any service interruptions for the service.
  • Interrupted — There has been anywhere from one to five minutes of interruption time for the service
  • Failed — There has been more than five minutes of interruption time for the service.

For more information, see Managing Notifications and Understanding Metric Collection.

The Monitoring tab provides you dashboards and graphical summary information. On the Monitoring tab, there are two drop-down menus:

  • From the left-most drop-down, you can choose from summaries of dashboards and graphs that includes: 
    • Summary — dashboard and historical information for your service that includes a health bar, messages rates, byte rates, Guaranteed Messages (Queue usage), Message spool usage, and discard rates.
    • Connections by Protocol — dashboard and historical information of the number of clients connected to the service by API protocol, REST, or via the Web
    • Messages Sent & Received — dashboard and historical information of incoming and outgoing messages sent to clients
    • Message Bytes & Rates — historical information of data sent and average incoming/outgoing data rates over secure (TLS) and non-secure connections, average incoming/outgoing message rates, client data received and sent
    • Guaranteed Messaging— dashboard and historical information of spool usage, incoming flows, outgoing flows ,transacted sessions, and transactions
    • Subscriptions — dashboard and historical information of subscriptions, unique subscriptions, remote subscriptions, unique local subscriptions, and export subscriptions complete percentage
  • From the right-most drop-down menu, you can choose the time interval of information as follows:
    • Last Hour — the last 60 minutes populated with data points aggregated at 20 second intervals
    • Last Day — the last 24 hours populated with data points aggregated at 5 minute intervals
    • Last Week — the last 7 days populated with data points aggregated at one hour intervals
    • Last Month — the last 31 days populated with data points aggregated at 4 hour intervals

    The visualized data is refreshed every five minutes. You can see the details about when it was last refreshed at the bottom of the page. For example, if the Monitoring tab is first viewed at 10:05, the Last hour time frame is from 9:05 to 10:05 and refreshed only after five minutes to display data from 9:10 to 10:10.

Understanding Metric Collection

With PubSub+ Insights, broker metrics are collected to allow dashboards and visuals to be built. Metrics are collected at a high frequency interval providing a large amount of data points. To manage the data volume used in visualizations, metric data is aggregated at fixed intervals. The interval size is auto-selected to best fit the selected time frame.

Understanding Notifications

A PubSub+ Insights subscription provides you with the ability to enable notifications for managing your event broker services. Insights uses Datadog (a third-party software provider) to:

  • collect metrics through the use of monitors
  • send notifications for events that occur on messaging service
  • configure thresholds

By default, notification emails are not enabled. To enable notifications for your account, see Enabling Notifications for Your Account. Once enabled, the email indicates useful information such as:

  • the name of the monitor
  • the issue
  • a description of the problem
  • recommended action
  • the severity
  • time and date of the problem
  • details of the affected event broker service (service name, organization, current threshold)

The links in the notification email from Datadog are not accessible at this time as access to Datadog isn't available.

Here's an example of the notification email with the severity of Alert when the Queue for Guaranteed Messages rises above 95% of the capacity:

Notification emails have three levels of severity as follows:

  • Alerts — A serious error where services are not available or operational. When an alert notification maps to a threshold level, this notification indicates that 95% of the available capacity has been reached.
  • Warning — An error where the event broker service is experiencing degradation or about to experience a loss of service if the issue is not resolved. When a warning notification maps to a threshold level, this notification indicates that 80% of the available capacity has been reached. These notifications often indicate a more serious problem may occur.
  • Recovery — Recovery notifications always follow the related Alert or Warning notification. For example, if you had received an Alert that indicated that the metric level had reached 95%, you'll get a corresponding Recovery notification when the metric level goes back below 95%.

The standard notifications are grouped into the following categories:

System Status Monitors

These monitors provide the status of the event brokers.

Monitor Name Severity Description Predicted Impact Recommended Actions
Guaranteed Message Down Alert

If the event broker service has Message Spooling is enabled, but it might not be operational. For more information, see Message Spooling.

For example, the spool status is not in the AD-Active or AD-Standby state.

Guaranteed Messaging may stop functioning and messages from this point are rejected. Solace will resolve this issue, but you can contact Solace support for details.
Local Activity Down Occurred Warning

An activity switch has occurred in the High Availability (HA) group of event brokers for the service. The primary broker may have lost activity. Activity has resumed on the primary broker or activity switched over to the backup broker.

.

A brief service interruption may have occurred, but service has been restored. Solace will resolve this issue, but you can contact Solace support for details.

Message VPN Status Monitors

These  monitors provide the status of the Message VPN.

Monitor Name Severity Description Predicted Impact Recommended Action
Cache Instance Down or Lost Message State Warning

The cache instance is down or it is in lost message state. See Solace PubSub+ Cache for more information.

Elevated risk of messaging traffic loss during the caching process.

Check that the affected cache instance operational status and network connectivity is stable.

Message Replication Down or Degraded Warning for Asynchronous Replication
Alert for Synchronous Replications
The Message VPN replication status is down or degraded in a Disaster Recovery configuration (DR). The site's persistent message consistency with the Standby site has been compromised. Check that the Disaster Recovery configuration is correct. For more information, see Addressing Degraded Service.
Bridge Down Alert The Message VPN Bridge is down. See Message VPN Bridge Configuration for more information. Messages cannot be transmitted between the bridged event broker services. Check that the service bridge configuration and network connectivity of the bridged event broker services are correct.

Metric Monitors

These monitors provide notifications of when a threshold has been reached for message spools and Message VPN connections. The monitors help you to understand the health of your messaging applications and provides you information to assess the impact and remedy the issue. The severity level is based on the which threshold that has been reached.

 

Monitor Name Severity Description Predicted Impact Recommended Actions
System Message Spool - Message Count

Warning or Alert

System message spool Message Count utilization threshold breach. When this resource is exhausted, no new guaranteed messages will be accepted by the event broker.

If this resource is 100% of capacity (resources exhausted), no new Guaranteed Messages will be accepted by the event broker.

Check that the endpoints with the most messages enqueued and contact the support or development team for the client applications that consume messages form these endpoints.

Message Spool

Warning or Alert

VPN message spool utilization threshold breach. If the VPN message spool is utilized to 100% of capacity (resources are exhausted), all persistent messages are rejected. Reduce spool usage as necessary to ease demand or scale up the class of the event broker service.
Message Spool - Egress Flow

Warning or Alert

Message spool egress flow utilization threshold breach. If the egress flow is utilized to 100% of capacity (resources are exhausted) for a queue or topic endpoint, consumers will not be able to bind or initiate a flow of messages from the persistent endpoint. Remove any unused egress flows from the Message VPN, otherwise reduce the egress flow use to ease demand.
Message Spool - Ingress Flow

Warning or Alert

Message spool ingress flow utilization breach. If the ingress flow is utilized to 100% capacity (resources are exhausted), any new persistent message publishers will not be able to send Guaranteed Messages. Remove any unused ingress flows from the Message VPN, otherwise reduce ingress flow use to ease demand.
Message Spool - Transaction

Warning or Alert

Message spool transaction utilization threshold breach. If the message  spool transaction is utilized to 100% of capacity (resources are exhausted), new transactions cannot be initiated in a transacted session until the current transactions have been committed.

Contact the support or the development team for the client applications that establish the transactions to verify that transactions are being committed in a timely fashion.

Alternatively, reduce the number of transactions to reduce demand.

Message Spool - Transacted Session

Warning or Alert

Message spool transacted session utilization breach Alert when the number of transacted sessions exceeds the monitor thresholds.
If the message spool transacted session is utilized to 100% of capacity (resources are exhausted), new transacted sessions (or transacted connections to the event broker) cannot be created until the current sessions (connections) are closed. Contact the support or development team for the client applications that establish the transacted sessions. That team should be consulted to ensure transacted sessions are used efficiently. Alternatively, reduce the transacted sessions to ease demand.
Message Spool - Endpoint

Warning or Alert

VPN message spool endpoint utilization breach. If the VPN message spool endpoint is utilized to 100% of capacity (resources are exhausted), new queues or topic endpoints cannot be created. Remove any unused endpoints from the Message VPN. Alternatively, reduce endpoint use to ease demand.
Message VPN Connections - Total

Warning or Alert

VPN total connection utilization threshold breach.

If the number of connections reaches the maximum number of connections (100% of capacity), new connections of any kind will be rejected. Scale up the class of the event broker service to allow more connections.
If possible, contact the support or development team for the client applications that are connected to the event broker to verify that the number of connections being used by the client applications is reasonable.
Message VPN Connections - SMF

Warning or Alert

VPN SMF connection utilization threshold breach.

 

If the number of SMF connections reaches the maximum number of connections (100% of capacity), new SMF connections will be rejected. Scale up the class of the event broker service to allow more connections. If possible, contact the support or development team for the client applications that are connected to the event broker to verify if the number of connections being used by the client applications is reasonable.
Message VPN Connections - Web

Warning or Alert

VPN Web transport connection utilization threshold breach.

If the number of Web connections reaches the maximum number of connections (100% of capacity), new Web connections will be rejected. Scale up the class of the event broker service to allow more connections. If possible, contact the support or development team for the client applications that are connected to the event broker to verify if the number of connections being used by those client applications is reasonable.
Message VPN Connections - AMQP

Warning or Alert

VPN AMQP connection utilization threshold breach.

If the number of AMQP connections reaches the maximum number of connections (100% of capacity), new AMQP connections will be rejected. Scale up the class of the event broker service to allow more connections. Otherwise, if possible, contact the maintainers of the client applications connected to the event broker to verify if the number of connections being used by those applications is reasonable.
Message VPN Connections - MQTT

Warning or Alert

VPN MQTT connection utilization threshold breach.

If the number of MQTT connections reaches the maximum number of connections (100% of capacity), new MQTT connections will be rejected. Scale up the class of the event broker service to allow more connections. If possible, contact the support or development team for the client applications that are connected to the event broker to verify if the number of connections being used by those client applications is reasonable.
Message VPN Connections - REST Incoming

Warning or Alert

VPN REST incoming connection utilization threshold breach. If the number of incoming REST connections reaches the maximum number of connections (100% of capacity, new incoming REST connections will be rejected. Scale up the class of the event broker service to allow more connections.
Message VPN Connections - REST Outgoing

Warning or Alert

Message VPN REST outgoing connection utilization threshold breach. If the number of outgoing REST connections reaches the maximum number of connections (100% of capacity, new outgoing REST connections can no longer be initiated by the event broker. Scale up the class of the event broker service to allow more connections.
Message VPN Subscriptions

Warning or Alert

Message VPN subscription utilization threshold breach. If the Message VPN spool subscription is utilized to 100% of capacity (resources are exhausted), consumers cannot add subscribe to additional topics. Scale up the class of the event broker service to allow more subscriptions. Alternatively, determine which clients are responsible for the large subscription count and then contact the support or development team for the client applications as required.
Message VPN Cache Instance - Topics

Warning or Alert

Cache Instance topics utilization threshold breach.

If the Message VPN Cache Instance for topics is utilized to 100% of capacity (resources are exhausted), new unique topics cannot be cached by the instance.

The maximum number of cached topics allowed can be increased if a large number of topics is expected; otherwise, you can do one of the following to reduce the number of topics cached by a given cache instance:

  1. Reduce the number of topic subscriptions configured in the Cache Cluster.
  1. Subdivide the topic subscriptions and distribute them between multiple Cache Clusters.
Message VPN Cache Instance - Memory

Warning or Alert

Cache Instance memory utilization threshold breach. If the Message VPN Cache Instance memory is utilized to 100% of capacity (resources are exhausted). it may affect cache operation and new messages may be rejected or if configured, the cache instance transition to the Down state.

The maximum amount of memory allowed for a Cache Instance can be increased if the large memory usage is expected and more memory is available on the host system where the Cache Instance is running. Otherwise, you can do one of the following to reduce the memory used by the Cache Instance:

  1. Reduce the maximum number of individual topics cached.

  2. Reduce the number of messages cached per individual topic.

  3. Reduce the number of topics cached and redistribute the cached topic space amongst other Cache Clusters to ease the request rate experienced by this cache instance.

Message VPN Cache Instance - Request Queue

Warning or Alert

Cache request queue utilization threshold breach. If the Message VPN Cache Instance's request queue is utilized to 100% of capacity (resources are exhausted), the cache instance may reject new lookup requests.

The maximum queue depth can be increased if large bursts of cache requests are expected and the request client applications can tolerate latencies in responding to the requests. Otherwise, reduce the request rate to the Cache Instance from the client applications.

Message VPN Cache Instance - CPU

Warning or Alert

Cache CPU utilization threshold breach.

If the Message VPN Cache Instance CPU is utilized to 100% of capacity (resources are exhausted), the cache throughput of the instance may affected. The cache host reached its maximum processing capacity. Contact Solace support if required.
Message VPN Client Username - Connections - Total

Warning or Alert

Client username total connection utilization threshold breach. If the Message VPN  total connections via a client username reach its limit, new clients cannot connect using the specified client username using any type of connection. The thresholds can be increased if a large number of clients connected to the event broker is expected. Otherwise, if possible, contact the maintainers of the client applications connected to the event broker to verify if the number of connections being used by those applications is reasonable.
Message VPN Client Username - Connections - SMF

Warning or Alert

Client username SMF connection utilization breach. If the Message VPN total SMF connections via a client username reach its limit, new clients cannot connect using the specified client username using SMF connections. The thresholds can be increased if a large number of clients connected to the event broker is expected. Otherwise, if possible, contact the maintainers of the client applications connected to the event broker to verify if the number of connections being used by those applications is reasonable.
Message VPN Client Username - Connections - Web

Warning or Alert

Client username web connection utilization breach. If the Message VPN  total Web connections via a client reach its limit, new clients cannot connect using the specified client-username using Web connections. The thresholds can be increased if a large number of clients connected to the event broker is expected. Otherwise, if possible, contact the maintainers of the client applications connected to the event broker to verify if the number of connections being used by those applications is reasonable.
Message VPN Client Username - Message Spool - Endpoint

Warning or Alert

Client username message spool endpoint utilization threshold breach. If the Message VPN  total endpoints via a client username reach its limit, any clients associated with this client username cannot create persistent endpoints. Remove any unused endpoints associated with the client username, otherwise, reduce endpoint use to ease demand.
Message VPN Queue - Storage

Warning or Alert

Queue spool utilization threshold breach.

If the Message VPN storage is utilized to 100% of capacity (exhausted resources), any new messages published to this queue are discarded by the Topic endpoint. If configured, the publisher is notified about the rejection (reject-message-to-sender-on-discard).

Reduce spool usage as necessary to ease demand otherwise, increase the limit on the queue.
Message VPN Topic Endpoint - Storage

Warning or Alert

Topic Endpoint spool utilization threshold breach.

If the Message VPN topic endpoint storage is utilized to 100% of capacity (exhausted resources), any new messages published to the specified topic may be discarded by the endpoint. If configured, the publisher is notified about the rejection (reject-message-to-sender-on-discard). Reduce spool usage as necessary to ease demand, otherwise, increase the limit on the topic endpoint.

Event Log Monitors

These monitors provide system alerts for corresponding log event.

Monitor Name Severity Description Predicted Impacts and Recommended Actions
VPN_SOLCACHE_SUBSCRIBE_FAIL Warning

The SolCache failed to add a subscription.

See Solace PubSub+ Syslog Events for predicted impacts and recommended actions.
VPN_SOLCACHE_CONFIG_SYNC_FAIL Warning The SolCache config sync has failed. See Solace PubSub+ Syslog Events for predicted impacts and recommended actions.
VPN_SOLCACHE_CLUSTER_SYNC_FAIL Warning The SolCache cluster config sync has failed. See Solace PubSub+ Syslog Events for predicted impacts and recommended actions.
VPN_BRIDGING_SUBSCRIPTION_ADD_FAILED Warning The Message VPN bridge failed to add a subscription. See Solace PubSub+ Syslog Events for predicted impacts and recommended actions.
VPN_BRIDGING_BRIDGE_STALLED Warning The Message VPN bridge has stalled. See Solace PubSub+ Syslog Events for predicted impacts.
VPN_AD_MSG_SPOOL_QUOTA_EXCEED Warning The Message VPN or endpoint message spool limit has been reached. See Solace PubSub+ Syslog Events for predicted impacts and recommended actions.
VPN_AD_MSG_SPOOL_HIGH Warning The Message VPN or endpoint message spool threshold has been breached. See Solace PubSub+ Syslog Events for predicted impacts and recommended actions.
VPN_AD_BIND_COUNT_HIGH Warning The Persistent endpoint bind count threshold has been breached. See Solace PubSub+ Syslog Events for predicted impacts and recommended actions.

Managing Notifications

Notification emails are not enabled by default when you first subscribe to PubSub+ Insights. You must enable your account to receive notifications. In addition to enabling your account to receive email, you can disable it or change the email address that notifications are sent to. For more information, see

Enabling Notifications for Your Account

To start receiving notifications, you must enable it for your account. Notifications are sent to an email address. This could be the email account you used for your PubSub+ Cloud account, a different email address. Follow these steps to start receiving notification emails:

  1. Log in to PubSub+ Cloud.
  2. On the left-menu, click your User Account Account Details.
  3. On the Account Details page, select the Monitoring Notifications tab.
  4. In the Mailing list or email for notifications box, type the name of the email to send notifications to, and then click the Activate Notifications button.

If you require notifications for multiple users, we recommend that you specify a group email address rather than a single email. If you need to disable notifications or change the email notifications are sent to, see Disabling Notifications for Your Account and Changing the Email for Your Account.

Changing the Email for Your Account

To change the email address that to send email notifications to, follow these steps:

  1. Log in to PubSub+ Cloud.
  2. On the left-menu, click your User Account Account Details.
  3. On the Account Details page, select the Monitoring Notifications tab.
  4. In the Mailing list or email for notifications box, type the new email to, and then click the Update Notifications button.

New notifications emails are now sent to the email that you specified.

Disabling Notifications for Your Account

To disable notification emails, follow these steps:

  1. Log in to PubSub+ Cloud.
  2. On the left-menu, click your User Account Account Details.
  3. On the Account Details page, select the Monitoring Notifications tab.
  4. Click the Activate Notifications button and in the Deactivate Service Notifications confirmation dialog, click the Deactivate button.

Notification emails are no longer sent. To enable notification emails again, you must enable it for your account. For more information , see Enabling Notifications for Your Account.