Collecting Core Dumps from an Event Broker Service in a Pod in a Customer-Controlled Region

PubSub+ Cloud deploys event broker services to containers within pods in Kubernetes clusters. Because pods are stateless, they lose any information kept within them when Kubernetes destroys them.

Core dump files are operating system (OS) level files that contain information that Solace can use to analyze why an application isn’t working as it should. The kernel creates these core dump files at the OS level within pods. When Kubernetes destroys the pod, it deletes the core dump files within the pod as well, which is a problem from a troubleshooting standpoint. To prevent the loss of core dump files, you can configure your Kubernetes pods using a DaemonSet to save and download the core dump files before pod destruction.

For a Customer-Controlled Region, the procedures in this document configure and deploy a DaemonSet to the nodes in your deployments that save core dump files created in your pods.

Solace recommends automatic installation using an updated Mission Control Agent. Manual installation requires an understanding of the Kubernetes command-line tool (kubectl) and appropriate cluster permissions.

This document explains:

Limitations to Collecting Core Dump Files

Collecting core dump files from event broker services using a DaemonSet has the following limitations:

  • Recreating the node with the core dump files causes loss of the file.

  • Rescheduling the event broker service pods to another node does not carryover any existing core dump file.

  • To maintain the node’s local ephemeral storage space, the DaemonSet monitors and cleans the core dump file as required.

  • When a core dump file exceeds the amount of space allocated to it, the DaemonSet deletes it immediately to ensure the node stays operational.

Permissions Required for Collecting Core Dump File

You must set specific Kubernetes permissions to allow the DaemonSet to collect core dump files from containers in the pod. The required permissions include:

  • privileged: true—The DaemonSet must have privileged access to the host, allowing it to perform actions that may otherwise be restricted, such as accessing certain system resources.

  • hostPID: true—Allows the pods deployed by the DaemonSet to see and potentially interact with host's processes namespace, which is required for operations that require interaction with host processes.

  • hostNetwork: true—Provides the DaemonSet direct access to the host's network interfaces.

  • The container must run as root (with no user specified)—Required to allow core dump collection.

  • The DaemonSet must set a host path (/dumps/) into the container—Allows the DaemonSet to access and store files from the host file system directly.

Configuring Core Dump Collection Management

The operating system stores core dump files on the node’s local disk (ephemeral storage). Each file ranges in size from 50 to 200MB. You must consider your node’s storage size when defining the configuration keys so that your nodes do not run out of storage.

You can change the following configuration keys to ensure the DaemonSet meets your requirements:

  • CONFIG_CLEAN_UP_MAX_FILE_TO_KEEP—Specifies the maximum number of core dump files to keep in the dumps folder. If exceeded, the DeamonSet deletes files, starting with the oldest.

  • CONFIG_CLEAN_UP_SLEEP_SECONDS— Specifies the interval, in seconds, between each scan for core dump files.

  • CONFIG_CLEAN_UP_FILE_AGE_DAYS—Specifies the maximum duration, in days, to keep a core dump file. The DaemonSet deletes core dump files that exceed the number.

By default, the DaemonSet deploys to all nodes. You can use affinity or node selectors to make it run in a specific set of nodes.

Installing Core Dump Collection

You have two options for installing core dump collection. Solace recommends using the automatic approach using an updated Mission Control Agent:

Installing the Core Dump Collection Automatically With the Mission Control Agent

Solace recommends deploying core dump collection automatically using an updated Mission Control Agent helm chart. This allows the Mission Control Agent to deploy the required core dump collection DaemonSet after being updated and removes the requirement for you to install the DaemonSet using Kubernetes commands.

To install core dump collection automatically, contact Solace.

Installing the Core Dump Collection Manually

If you want to install the core dump DaemonSet manually, perform these steps:

  1. Using kubectl access to your cluster.
  2. Run the following command and manifest to install DaemonSet :
    kubectl apply -f - <<EOF
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: solace-core-dump
      labels:
      app: core-dump-config
    
    data:
      CONFIG_CLEAN_UP_MAX_FILE_TO_KEEP: "10"
      CONFIG_CLEAN_UP_SLEEP_SECONDS: "3600"
      CONFIG_CLEAN_UP_FILE_AGE_DAYS: "30"
    
      core-dump-config.sh: |
        #!/bin/sh
        CLEAN_UP_SLEEP_SECONDS=${CLEAN_UP_SLEEP_SECONDS:-3600}
        CLEAN_UP_FILE_AGE_DAYS=${CLEAN_UP_FILE_AGE_DAYS:-30}
        CLEAN_UP_MAX_FILE_TO_KEEP=${CLEAN_UP_MAX_FILE_TO_KEEP:-10}
        set -e
                
        cat >/dumps/gen_compress_core.sh <<"EOF1"
        #!/bin/sh
        exec /bin/gzip >"$1"
        EOF1
                  
        chmod +x /dumps/gen_compress_core.sh
    
        nsenter -t 1 -m -- su -c "echo \"|/dumps/gen_compress_core.sh /dumps/core.%p.gz\" > /proc/sys/kernel/core_pattern"
        ulimit -c unlimited
        while true; do
          nsenter -t 1 -m -- su -c "find /dumps -name 'core.*' -mtime +${CLEAN_UP_FILE_AGE_DAYS} -delete"
          nsenter -t 1 -m -- su -c "ls -1tr /dumps/core.* | head -n -${CLEAN_UP_MAX_FILE_TO_KEEP} | xargs -d '\n' rm -f --"
          sleep ${CLEAN_UP_SLEEP_SECONDS}
        done
    ---            
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: solace-core-dump
    spec:
      selector:
        matchLabels:
          app: node-configurer-core-dump
      template:
        metadata:
          labels:
            app: node-configurer-core-dump
        spec:
          tolerations:
            - operator: "Exists"
          hostPID: true
          containers:
            - image: alpine
              name: node-configurer
              env:
                - name: CLEAN_UP_SLEEP_SECONDS
                    valueFrom:
                      configMapKeyRef:
                        name: solace-core-dump
                        key: CONFIG_CLEAN_UP_SLEEP_SECONDS
                - name: CLEAN_UP_FILE_AGE_DAYS
                    valueFrom:
                      configMapKeyRef:
                        name: solace-core-dump
                        key: CONFIG_CLEAN_UP_FILE_AGE_DAYS
                - name: CLEAN_UP_MAX_FILE_TO_KEEP
                    valueFrom:
                      configMapKeyRef:
                        name: solace-core-dump
                        key: CONFIG_CLEAN_UP_MAX_FILE_TO_KEEP
              resources:
                requests:
                  cpu: 1m
                limits:
                  cpu: 200m
              securityContext:
                privileged: true
              command: ["/bin/sh"]
              args:
                - "-c"
                - "sh /opt/scripts/core-dump-config.sh"
              volumeMounts:
                - mountPath: /opt/scripts
                  name: dumps-scripts
                - name: dump-volume
                  mountPath: /dumps
          volumes:
            - name: dump-volume
              hostPath:
                path: /dumps/
                type: DirectoryOrCreate
            - configMap:
                defaultMode: 365
                name: solace-core-dump
                items:
                  - key: core-dump-config.sh
                    path: core-dump-config.sh
              name: dumps-scripts
    EOF
    

Retrieving and Saving Core Dump Files

To retrieve the saved core dump files, perform these steps:

  1. Using kubectl access your cluster.
  2. Run the following commands to find the DaemonSet pod containing the core dump files and copy them from the cluster to a permanent local storage location:
    # Find the daemon set pod that has the dump file by the node name
    kubectl -n <namespace> get pod -o wide
    # Find the dump file in the pod. Run `cd /dumps` after exec into pods
    kubectl -n <namespace> exec -it <pod-name> sh
    # Copy it to local with the name dump.tar.gz
    kubectl cp <namespace>/<daemon-set-pod-name>:dumps/<filename> dump.tar.gz