Installing PubSub+ Cloud in Oracle Kubernetes Engine (OKE)

Deploying PubSub+ Cloud to Oracle Kubernetes Engine (OKE) in Oracle Cloud Infrastructure (OCI) is a Controlled-Availability (CA) feature. Contact Solace to see if this feature is suitable for your use case and deployment requirements. For the Controlled Availability (CA) release of OKE, Solace tested deployment of PubSub+ Cloud with OKE version 1.32.1. Solace regularly validates new Kubernetes versions as outlined in our Kubernetes Adoption Policy.

Oracle Kubernetes Engine (OKE) is a fully managed, scalable, and highly available Kubernetes deployment that allows you to provision applications in virtual nodes on Oracle Cloud Infrastructure (OCI). You can deploy PubSub+ Cloud to OKE on OCI, providing your event broker services with the benefits provided by OCI, including fast and low-latency networking provided by Single Root I/O Virtualization (SR-IOV).

This deployment guide is intended for customers installing PubSub+ Cloud in a Customer-Controlled Region. For a list of deployment options, see PubSub+ Cloud Deployment Ownership Models.

You must perform a number of environment-specific steps to install PubSub+ Cloud.

  • Before you perform the environment-specific steps described below, ensure that you review and fulfill the general requirements listed in Common Kubernetes Prerequisites.

  • Solace does not support event broker service integration with service meshes. Service meshes include Istio, Cilium, Linkerd, Consul, and others. If deploying to a cluster with a service mesh, you must:

For more information, see the following sections:

Oracle Cloud Infrastructure Prerequisites

Before you create your virtual cloud network (VCN), Kubernetes cluster, and deploy PubSub+ Cloud, you must ensure you meet the following prerequisites:

  • You must have an active OCI account with the following permissions:

    • manage permissions on virtual-network-family in target compartment

    • manage permissions on load-balancers in target compartment

    • manage permissions on cluster resource type in target compartment

    • manage permissions on instance-family in target compartment

    • manage permissions on volume-family in target compartment

    • use permissions on object-family in target compartment

  • You must have installed and configured the following software:

    • OCI command line interface (CLI) version 3.4.0 or later

    • Kubectl command line tool version 1.26 or later

    • Helm version 3.8 or later

Virtual Cloud Network and Networking Requirements

When configuring your OKE cluster, you must first configure a VCN on OCI and then create two subnets within the VCN. For more information, see:

VCN Configuration

Before you create your Kubernetes cluster, and deploy PubSub+ Cloud, you must first create a VCN with the following components:

  • A minimum class inter-domain routing (CIDR) block size of 10.0.0.0/24. For more information, see IP Range.

  • An internet gateway (to allow your event broker services to access the public internet) or NAT gateway (for private clusters)

  • A service gateway for OCI service access

  • Route tables with appropriate routes

For more information, see Creating a VCN in the Oracle documentation.

IP Range

A minimum CIDR size of /24 is required for an OKE cluster that's dedicated to event broker services. A larger size is required if the cluster contains other services. You should consider your CIDR block carefully and use a size that meets any anticipated growth in your cluster size. You can use the Solace provided downloadable excel-based CIDR calculator to calculate your CIDR requirements.

Subnet Configuration

Solace recommends VCN-native pod networking, which provides direct routing capabilities and enhanced connectivity. When creating your VCN-native pod networking resources, you must ensure the resources meet the following requirements:

  • You must configure a minimum of four distinct subnets in the OCI VCN.

  • Each subnet should be regional (not availability domain specific) for high availability.

  • The service CIDR block must not overlap the VCN CIDR ranges.

  • Your load balancer subnets should be separate from your worker node subnets.

For the node subnets, Solace recommends the following configurations:

  • Purpose: Hosts Kubernetes nodes

  • Security: Can be either private (recommended) or public

  • Route Table: Can be either private (recommended) or public

For the load balancer subnets, Solace recommends the following configurations:

  • Purpose: Hosts OCI LoadBalancers

  • Security: Can be either private (for internal access) or public (for external access)

  • Route Table: Can be either private (for internal access) or public (for external access)

For more information, see Creating VCN-Native Pod Network Resources in the Oracle documentation.

After you create your subnets, you can create your Kubernetes cluster and attach it to the VCN and subnets. You can then proceed to configure your node pools

Configuring OKE Node Pools

After you create your VCN and subnets, you can configure your OKE cluster. Solace recommends that you configure your cluster and its node pools to meet the configurations listed in the following sections:

When you create a cluster with these specifications, the OKE cluster in OCI has multiple node pools and is designed to autoscale when new event broker services are created. Each node pool provides the exact resources required by each event broker service to help optimize the cluster's utilization.

OKE Cluster Architecture Requirements

Solace recommends the following cluster architecture for production deployments:

For more information, see Deployment Architecture for Kubernetes.

Configuring High Availability in OKE

OCI uses availability domains and fault domains to achieve high availability.

Availability Domains
Availability domains are physically isolated data centers within a region. Most OCI regions have three availability domains (for example, Phoenix, Ashburn, and Frankfurt). Some OCI regions have only one availability domain. All OCI availability domains have three fault domains.
Fault Domains
A fault domain is a grouping of hardware and infrastructure within an OCI availability domains. Each OCI availability domain contains three fault domains, allowing you to distribute the nodes in your event broker services so they are deployed on different physical hardware within a single availability domain.

To ensure high availability for your event broker services, you must deploy the event broker services so that the individual nodes in the service are distributed across multiple availability domains (or fault domains if multiple availability domains are unavailable). Deploying your event broker service nodes across multiple availability domains or fault domains prevents service interruptions in cases of hardware failure or maintenance events.

For more information, see Regions and Availability Domains in the Oracle documentation.

The following code example shows how to place the nodes of an event broker service across multiple ADs:

"placementConfigs": [
	 {
          "availability-domain": "pILZ:PHX-AD-1",
          "subnet-id": "ocid1.subnet.oc1.phx.aaa.."
        },
        {
          "availability-domain": "pILZ:PHX-AD-2",
          "subnet-id": "ocid1.subnet.oc1.phx.aaa.."
        },
        {
          "availability-domain": "pILZ:PHX-AD-3",
          "subnet-id": "ocid1.subnet.oc1.phx.aaa.."
        }
  ],

Node Pool Configuration and Resource Recommendations

When you configure the resource requirements for your node pools, Solace recommends using flexible shapes for your pods based on our suggested configurations for the event broker service class, or other system applications the node will host. You should also configure your pods to use Single Root I/O Virtualization (SR-IOV). For more information, see:

Node Pool Configuration for Event Broker Services

For nodes containing event broker services, Solace recommends using flexible shapes for your pods with configurations based on the event broker service class the node will host:

Node Pool Type VM Type Oracle Compute Units
(OCPU)
Memory (GB) Boot
Block
Storage (GB)
Number of Worker Nodes Required for High-Availability Event Broker Services
Monitoring VM.Optimized3.Flex 1 4 50 One for each Event broker service (the sum of all services of all types)
Developer,
Enterprise 250, and
Enterprise 1K
VM.Optimized3.Flex 1 16 256 Two for each Enterprise 1K event broker service
Enterprise 5K VM.Optimized3.Flex 2 32 512 Two for each Enterprise 5K Event broker service

Enterprise 10K

VM.Optimized3.Flex 2 32 780 Two for each Enterprise 10K Event broker service

Enterprise 50K

VM.Optimized3.Flex 4 64 1024 Two for each Enterprise 50K Event broker service

Enterprise 100K

VM.Optimized3.Flex 8 64 1300 Two for each Enterprise 100K Event broker service

If you plan to deploy other services, such as Micro-Integrations, the Solace Open Telemetry Receiver pod for Distributed Tracing, or the Cloud-Based Event Management Agent pod, you must provide these services with additional node pools of the same type required by the Developer, Enterprise 250, and Enterprise 1K.

Node Configuration For System Applications

When you deploy PubSub+ Cloud, you must provide a pod in your node pool for the Mission Control Agent. You may also want to use a bastion host to provide controlled access to your cluster.

For more information, see:

Mission Control Agent Node Configuration

You must provide a pod for the Mission Control Agent. Solace recommends the following configuration for the node hosting the Mission Control Agent:

Resource Type Mission Control Agent
Minimum resource
required
Minimum VPU
Required
VM.Optimized3.Flex OCPU 1 N/A
Memory 4 8
Boot block storage GB 50 N/A
Performance 500 10
Bastion Host Node Configuration

If you plan to use a bastion host to provide controlled access to your cluster, Solace recommends the following configuration for the pod hosting the bastion host:

Resource Type Bastion Host
Minimum resource
required
Minimum VPU
Required
VM.Optimized3.Flex OCPU 1 N/A
Memory 1 N/A
Boot block storage GB 50 N/A
Performance 500 10

Configuring Pods with SR-IOV Image Requirements

Single Root I/O Virtualization (SR-IOV) is a hardware virtualization technology that allows a single physical network interface card (NIC) to appear as multiple separate physical devices. When combined with VFIO (Virtual Function I/O), it provides direct hardware access to virtual machines. For more information, see Configuring SR-IOV for Virtual Networking in the Oracle documentation.

Using Taints and Labels in OKE

OCI managed OKE node pools don't directly support taints. To apply the taints required to ensure proper pod placement, you must use custom cloud-init scripts. For more information, including examples, see Example Usecases for Custom Cloud-Init Scripts in the Oracle documentation.

You must use custom cloud-init scripts to apply labels and taints to the nodes, ensuring proper placement of your event broker services within the nodes. The following table provides the required labels and taints based on event broker service class the node will be hosting:

Name Labels Taints
monitoring nodeType: monitoring nodeType:monitoring:NoExecute
prod1k

nodeType:messaging

serviceClass:prod1k

nodeType:messaging:NoExecute

serviceClass:prod1k:NoExecute

prod5k

nodeType:messaging

serviceClass:prod5k

nodeType:messaging:NoExecute

serviceClass:prod5k:NoExecute

prod10k

nodeType:messaging

serviceClass:prod10k

nodeType:messaging:NoExecute

serviceClass:prod10k:NoExecute

prod50k

nodeType:messaging

serviceClass:prod50k

nodeType:messaging:NoExecute

serviceClass:prod50k:NoExecute

prod100k

nodeType:messaging

serviceClass:prod100k

nodeType:messaging:NoExecute

serviceClass:prod100k:NoExecute

Storage Configuration

Oracle Cloud Infrastructure (OCI) uses vpusPerGB as a parameter to specify the number of Volume Performance Units (VPUs) applied to a block or boot volume per gigabyte of storage. vpusPerGB determine the volume's elastic performance settings. Higher vpusPerGB values provide increased input/output operations per second (IOPS) and throughput at increased cost. Solace supports the following vpusPerGB:

  • "10"—Balanced

  • "20"—Higher Performance

Solace recommends settings your vpusPerGB value to "20" for all configurations. For more information, see Provisioning PVCs on the Block Volume Service in the Oracle documentation.

The following is a sample StorageClass configuration:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: oci-block-volume-high-performance
provisioner: blockvolume.csi.oraclecloud.com
parameters:
  csi.storage.k8s.io/fstype: xfs
  vpusPerGB: "20" 
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Load Balancer Configuration and Annotations

After you create your Kubernetes cluster, you must provide specific annotations so your Kubernetes services create your load balancers correctly. Based on the access requirements for your event broker services, you can configure internal or external load balancers. For more information, see:

For more information about load balancers in OKE in general, see Summary of Annotations for Load Balancers and Network Load Balancers in the Oracle documentation.

Internal Load Balancer Annotations

We recommend configuring your internal load balancer with at least the following annotations:

internalServiceAnnotations:
  oci.oraclecloud.com/load-balancer-type: "nlb"
  oci-network-load-balancer.oraclecloud.com/is-preserve-source: "true"
  oci-network-load-balancer.oraclecloud.com/internal: "true"

These annotations configure:

  • A Layer 4 network load balancer (instead of a Layer 7)

  • Source IP preservation for better client tracking

  • Internal-only access (not accessible from the internet)

External Load Balancer Annotations

We recommend configuring your external load balancer with at least the following annotations:

externalServiceAnnotations:
 oci.oraclecloud.com/load-balancer-type: "nlb"
 oci-network-load-balancer.oraclecloud.com/is-preserve-source: "true"

These annotations provide the following configuration for your load balancer:

  • A Layer 4 network load balancer

  • Source IP preservation for better client tracking

  • Public access (accessible from the internet)

Cluster Autoscaler Configuration

You must enable the Kubernetes Cluster Autoscaler as an Oracle Kubernetes Engine (OKE) add-on.

Your Kubernetes Cluster Autoscaler configuration file should include appropriate settings for minimizing and maximizing the size of each node pool and how you want autoscaling behavior to occur. For more information, see Using the Kubernetes Cluster Autoscaler in the Oracle documentation.

Troubleshooting

If any issues prevent you from deploying PubSub+ Cloud to OKE on OCI successfully you can try to troubleshoot aspects of your OKE cluster, or contact Solace.

The following list provides some potential troubleshooting issues, and possible troubleshooting options, including:

Pod Scheduling Issues

If you cannot schedule pods due to resource constraints or node selector issues, you can try the following kubectl commands:

  • Check the node resources

    kubectl describe nodes
  • Verify node labels

    kubectl get nodes --show-labels
  • Check pod events

    kubectl describe pod <pod-name> -n solace-cloud

Storage Provisioning Issues

If your PersistentVolumeClaims remain in a pending state, you can try the following kubectl commands:

  • Check the storage class:

    kubectl describe storageclass oci-block-volume-high-performance
  • Verify the PVC status:

    kubectl describe pvc <pvc-name> -n solace-cloud
  • Check the OCI Block Volume provisioner logs:

    kubectl logs -l app=oci-csi-node -n kube-system

LoadBalancer Service Issues

If your LoadBalancer service does not receive an external IP or is inaccessible, you can try the following kubectl commands, and verify the security list:

  • Check the LoadBalancer service status:

    kubectl get svc -n solace-cloud
  • Verify the LoadBalancer annotations:

    kubectl describe svc <service-name> -n solace-cloud
  • Verify that the security list allows traffic to and from the LoadBalancer

SR-IOV Configuration Issues

If your Single Root I/O Virtualization (SR-IOV) networking is not working properly, you can try the following kubectl commands:

  • Verify the node is using a SR-IOV enabled image:

    kubectl get nodes -o custom-columns=NAME:.metadata.name,IMAGE:.status.nodeInfo.osImage
  • Check which SR-IOV driver is in use:

    kubectl debug node/<selected node> -it --image=busybox -- chroot /host ip link show # Get all interfaces for the node

    Look for a driver that supports SR-IOV in the response. Manufacturer drivers that support SR-IOV include, but are not limited to:

    • Intel network interface cards (NIC): ixgbe, i40e, or ice

    • Mellanox NICs: mlx5_core

    • Broadcom NICs: bnxt_en

    The following response example shows support for Intel's ixgbe driver

    driver: ixgbe
    version: 5.1.0-k
    firmware-version: 0x800008e2, 1.2345.0
    bus-info: 0000:00:03.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: yes
  • Verify the VFIO driver is loaded:

    kubectl debug node/<selected node> -it --image=busybox -- chroot /host ethtool -i <interface>

Diagnostic Commands

The following diagnostic commands can be used to get diagnostic information you can share with Solace when troubleshooting potential issues with your deployment to OKE in OCI.

Cluster Health Check

# Check node status
kubectl get nodes

# Check pod status
kubectl get pods --all-namespaces
			
# Check events
kubectl get events --sort-by=.metadata.creationTimestamp

Network Diagnostics

# Test connectivity between pods
kubectl exec -it <pod-name> -n solace-cloud -- ping <target-ip>

# Check DNS resolution
kubectl exec -it <pod-name> -n solace-cloud -- nslookup kubernetes.default

# Check service endpoints
kubectl get endpoints -n solace-cloud

Storage Diagnostics

# Check PVCs
kubectl get pvc -n solace-cloud

# Check PVs
kubectl get pv

# Check storage class
kubectl describe storageclass oci-block-volume-high-performance

Mission Control Agent Diagnostics

# Check MCA pod status
kubectl get pods -l app=solace-cloud-ca -n solace-cloud -o wide

# Check MCA logs
kubectl logs -l app=solace-cloud-ca -n solace-cloud

# Check MCA configuration
kubectl get configmap -n solace-cloud