Using Service-Level Dashboards for Event Broker Services

For each event broker service, a service-level dashboard gives you a single entry point to see historical and real-time metrics at a high-level in the Cloud Console. Insights collects metrics and gives you a dashboard for the event broker service to:

  • monitor the health and stability of your Message VPNs, endpoints, clients, and VPN Bridges
  • analyze capacity and bottlenecks so that you better manage your services and perform capacity management
  • proactively monitor your services to minimize downtime
  • build an understanding of application behavior and resource utilization

More in-depth PubSub+ Insights dashboards for Datadog are available in Datadog and direct access to these dashboards is available through the Datadog accounts created for users that have the Insights Advanced Editor or Insights Advanced Viewer role assigned to them. For more information, see Configuring an Existing User to Access Datadog

When you have subscribed to Insights, the Monitoring tab is populated with dashboard and historical information. As part of the summary information for Service-Level dashboards, there are many useful visualizations. One useful visualization shows the overall health and status of your service as follows:

  • OK — The service is running and there are no current interruptions on the event broker service.
  • Interrupted — There has been anywhere from one to five minutes of interruption time for the event broker service.
  • Failed — There has been more than five minutes of interruption time for the event broker service.

These states are monitored by Solace. Resolution of event broker services are actively investigated by Solace. You can contact Solace for event broker service status information. For an understanding of how the metrics are collected and visualized, see the following topics:

For a list of available metrics that can be collected, see the following topic:

Screenshot showing the options described in this topic.

The Monitoring tab provides you a high-level dashboard and graphical summary information. The Monitoring tab is available when you select an event broker service in Cluster Manager. On the Monitoring tab, there are two drop-down menus:

  • From the left-most drop-down, you can choose from summaries of dashboards and graphs that includes: 
    • Summary — dashboard and historical information for your service that includes a health bar, messages rates, byte rates, Guaranteed Messages (Queue usage), Message spool usage, and discard rates.
    • Connections by Protocol — dashboard and historical information of the number of clients connected to the service by API protocol, REST, or via the Web
    • Messages Sent & Received — dashboard and historical information of incoming and outgoing messages sent to clients
    • Message Bytes & Rates — historical information of data sent and average incoming/outgoing data rates over secure (TLS) and non-secure connections, average incoming/outgoing message rates, client data received and sent
    • Guaranteed Messaging — dashboard and historical information of spool usage, incoming flows, outgoing flows ,transacted sessions, and transactions
    • Subscriptions — dashboard and historical information of subscriptions, unique subscriptions, remote subscriptions, unique local subscriptions, and export subscriptions complete percentage
  • From the right-most drop-down menu, you can choose the time interval of information as follows:
    • Last Hour — the last 60 minutes populated with data points aggregated at 20 second intervals
    • Last Day — the last 24 hours populated with data points aggregated at 5 minute intervals
    • Last Week — the last 7 days populated with data points aggregated at one hour intervals
    • Last Month — the last 31 days populated with data points aggregated at 4 hour intervals

    The visualized data is refreshed in the specified Update Interval period located at the bottom of the page (in this case, it is one minute). You can see the details about when it was last refreshed at the bottom of the page. For example, if the Monitoring tab is first viewed at 10:05, the Last hour time frame is from 9:05 to 10:05 and refreshed only after one minute to display data from 9:06 to 10:06.

Understanding Metric Collection

With PubSub+ Insights, event broker service metrics are collected to allow service-level dashboards and visuals to be built. Metrics are collected at a high-frequency interval that provide a large number of data points. To manage the data volume used in visualizations, metric data is aggregated at fixed intervals. The interval size is auto-selected to best fit the selected time frame.

Understanding of Visualizations on the Service-Level Dashboard

You can select the time ranges for your data to see the metrics that have been collected to visualize your data. PubSub+ Cloud sends data every ten seconds to Datadog. The data that's used in the graphs and charts are cached for the refresh interval specified at the bottom of the monitoring page for each time range. If you refresh the page or switch to another time-interval range that was previously loaded by another user (including yourself), the same data from the cache is shown for that same time-interval period. For this reason, you might not see changes reflected immediately for changes to the metrics or the visualizations for your service.

It's also important to note that the data on the visualizations (or graphs) on the Monitoring tab automatically refreshes based on the Update Interval (in minutes) located the bottom of the tab.

The Monitoring tab shows visualizations from data points. The following are terms you should know to understand how the these data points are determined for the visualizations:

now
The point in time the data is first loaded.
ticks
On the x-axis, there are gradations that are automatically calculated by the charting engine depending on the number of data points (periods).
time aggregation
The aggregation type applied per period in a time-series result. This aggregated data becomes a data point.
periodicity
The size of the periods that data for the selected time frame is aggregated (the interval which data is aggregated). Each period is defined an aggregation of the values from the beginning of the period to the beginning of the next period. That aggregation is your data point.

Data points for the visualizations are determined based on the aggregation of data. The data that we've collected is visualized as line-graphs (time-series data) and charts (category-based data).

The granularity of data becomes more coarse as the time frame increases. Data within the period is aggregated using time aggregation.For information about aggregation, see Understanding Time Aggregation. The periodicity that used depends on the selected time range you select as shown in the following table: 

Time Range Relative Range Periodicity
Last hour (now - 1h) to now 20 second
Last day (now - 1d) to now 5 minutes
Last week (now - 7d) to now 1 hour
Last month (now - 31d) to now 4 hours

The appropriate periodicity is selected based on the time frame selected and the amount of data available.

Understanding Time Aggregation

Time aggregation is done by averaging data that has been collected. This data is aggregated into larger intervals and the values are averaged over the selected periodicity. For example, to display 8:00 PM on the line chart for a four hour periodicity, all the values from the period from 8:00 PM to 12:00 AM are averaged together to from a data point. Let's say in this case, there are three values (100, 335, 500) in that four-hour period, then the value 312 is the time aggregated value for 8:00 PM. Then, if for the next four hours there are four values (498, 500, 502, 500), then the average is 500.