Installing PubSub+ Cloud in Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) is a secure and fully managed Kubernetes service that maximizes your operational efficiency. Google takes care of the underlying infrastructure of your entire cluster, including nodes. For more information about GKE, see the Google Kubernetes Engine documentation.

This deployment guide is intended for customers installing PubSub+ Cloud in a Customer-Controlled Region. For a list of deployment options, see PubSub+ Cloud Deployment Ownership Models.

There are a number of environment-specific steps that you must perform to install PubSub+ Cloud.

Before you perform the environment-specific steps described below, ensure that you review and fulfill the general requirements listed in Common Kubernetes Prerequisites.

Solace does not support event broker service integration with service meshes. Service meshes include Istio, Cilium, Linkerd, Consul, and others. If deploying to a cluster with a service mesh, you must:

  • exclude the target-namespace used by PubSub+ Cloud services from the service mesh.
  • set up connectivity to event broker service in the cluster using LoadBalancer or NodePort. See Exposing Event Broker Services to External Traffic for more information.

Solace provides reference Terraform projects for deploying a Kubernetes cluster to AKS, EKS, and GKE. These Terraform projects have the recommended configuration settings, such as worker node sizes, resource configurations, taints, and labels optimized to install PubSub+ Cloud.

You can download the reference Terraform projects from our GitHub repository: https://github.com/SolaceLabs/customer-controlled-region-reference-architectures

Beware that all sample scripts, Terraform modules, and examples are provided as-is. You can modify the files as required and are responsible for maintaining the modified files for your Kubernetes cluster.

The steps that you (the customer) perform are as follows:

  1. Create the cluster as described in GKE Cluster Specifications.

GKE Cluster Specifications

Before you (the customer) install the Mission Control Agent, you must configure the GKE cluster with the technical specifications listed in the sections that follow which include these areas:

When created with these specifications, the GKE cluster has multiple node pools, and is designed to be auto-scaled when new event broker services are created. Each node pool provides the exact resources required by each plan to help optimize the cluster's utilization.

If you have configured your GKE cluster to use Dataplane v2, you must use event broker service versions 10.3 and later. Dataplane v2 is an optimized dataplane for GKE clusters that offers some advantages, including scalability, built in security and logging, and consistency across clusters. For more information, see About Dataplane v2 in the Google Kubernetes (GKE) documentation. Solace does not support Dataplane v2 with event broker services prior to version 10.3.

Networking

You have two choices for your virtual private cloud (VPC):

  • VPC-native: A VPC using alias IPs, which is the recommended option by Google Cloud. If you intend to have private nodes in your cluster, VPC-native is the only option available if your cluster will have private worker nodes. Our reference Terraform uses a VPC-native cluster.

  • Route-based: A VPC using Google Cloud routes.

The cluster requires a VPC with a subnet. We recommend a private cluster with public access to the control plane. In this configuration, Solace suggests using a NAT Gateway to provide the worker nodes with access to the Internet to communicate with PubSub+ Cloud and Datadog.

For subnet IP address range sizing, you require four CIDR ranges in the subnet to support the GKE cluster:

  • A network CIDR range, used by the worker nodes in the cluster.

  • A services CIDR range, is used for Kubernetes services.

  • A pods CIDR range, used by system (default) node pool.

  • A messaging pods CIDR, used by the worker nodes for your event broker services.

For more information, see Creating a VPC-native cluster in the Google Cloud documentation.

CIDR requirements for your cluster are outlined in the table below:

CIDR Range Name Is The CIDR Range Routable? What is the CIDR Range Used For? Minimum CIDR Range Required
Network Yes Used by all virtual machines (VMs) in the VPC, including worker nodes, as well as internal load balancers. /28
Services No

Used by the Kubernetes services.

The IP addresses in this range are not exposed outside the cluster.

/16
Pods

No

Used to assign CIDR ranges on the cluster for the system (default) node pool.

/24

Messaging Pods

Yes

Used to assign CIDR ranges and IP addresses to the worker nodes hosting the pods containing your event broker service.

Solace suggests limiting the pods per node to 8 or 16 because PubSub+ Cloud event broker services require a fairly low pod count per node.

For more information about Pod requirements, see the Cluster Configuration section of our reference Terraform.

/24

We outline the specifications for the CIDR ranges below.

Provided there is sufficient space, all address ranges can be increased in size at any time if more capacity is required.

Network CIDR Range

Determine the total number of enterprise and developer event broker services that you expect to create in the cluster. You require:

  • Three nodes for the system pool.

  • Three nodes per enterprise (HA) event broker service.

  • One node per developer event broker service.

You can use the formula below to determine the total count:

node count = 3 * enterprise + 1 * developer + 3 (for the system node pool)

For example, if you require 10 enterprise event broker services and 10 developer event broker services , then you need 43 total nodes.

Once you determine the required node count, you can look up the required size of the primary IP range using the Subnet primary IP range table in the GKE VPC-native clusters documentation.

Note that PubSub+ Cloud requires a minimum range of /28. We recommend making the IP range larger than what you calculate to allow for any unplanned future expansion.

Services CIDR Range

This calculation is required only if the secondary subnet IP ranges are not auto-generated by GCP.

This secondary IP range is for Kubernetes services. Each messaging service (enterprise or developer) creates two Kubernetes services, that is, two IP addresses, per messaging service. Therefore, the total required number of Kubernetes services is simply twice the number of messaging services. The IP range can be determined from this table in the Google documentation.

Again, Solace suggests adding some padding to this range to allow for unplanned future expansion. There are at least 10 internal Kubernetes services that need IP addresses.

Pod and Messaging Pod CIDR Ranges

This calculation is required only if the secondary subnet IP ranges are not auto-generated by GCP.

The pod CIDR range is for the system (default) node pool. The messaging pod CIDR range is for worker pods hosting your event broker services.

The three system-node-pool nodes each need a /24 range for your default 110 pods, which means a minimum secondary range of /22 with 16 nodes per pod to support up to 8 service worker nodes. This configuration allows up to 2 HA services or 8 developer services (or a combination of the two.)

The CIDR range for messaging pods is based on the maximum pods per node and the number of nodes in the cluster. If limiting the size of the IP range is a concern, Solace suggests limiting the pods per node in the node pools (except for the system node pool) to 16 (from 110), because PubSub+ Cloud event broker services require a fairly low pod count per node.

A full description of the calculations required to determine the pod secondary IP range can be found in the Google documentation.

A more reasonable range is /20, which provides up to 104 worker nodes for messaging services.

VPC and Subnet

Once any IP ranges have been calculated, the VPC and subnet can be created.

Create a VPC Network, then add a subnet to it. Both can be named whatever you choose. For the subnet, provide the primary IP range, and (if required) also the two secondary IP ranges calculated from above. The subnet's region must be the region you want the cluster in.

NAT Gateway

If the cluster is going to be a private cluster (which is our recommendation) then a NAT gateway must be set up to allow pods on the worker nodes access to the internet. This is required so the Mission Control Agent can communicate with the PubSub+ Home Cloud and our monitoring solution can ship metrics and logs. To do this, you use Cloud NAT. For more information about Cloud NAT, see the Google documentation.

After the VPC and subnet are set up, go to Cloud NAT and create a NAT Gateway, selecting the VPC and the region that the subnet is in.

Cluster

Once the VPC, subnet, and a NAT Gateway are configured, the cluster can be created.

Below are the settings for the cluster itself.

  • Basics
    • Location type: Regional (with 3 zones)
    • Stable version: Release channel
  • Networking
    • Private cluster: enabled
    • Access master using its external IP address: enabled
    • Master IP Range: /28 network, must be different than any in the VPC
    • Network: the network created above
    • Node subnet: the subnet created above
    • Enable network policy: enabled
    • Enable HTTP Load Balancing: enabled
    • Enable master authorized networks: disabled
  • Security
    • Enable Shielded GKE nodes: enabled
    • Enable workload identify: enabled

Node Pools

The following are the required settings for the node pools. These should be configured at the same time as the cluster.

system-node-pool
  • General
    • Number of nodes (per zone): 1
    • Autoscaling: disabled
  • Nodes
    • ImageType: Container-Optimized OS with containerd (cos_containerd)
    • Machine Type: n1-standard-4
    • Boot Disk Size: 100GB
monitoring-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled
      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes

    • Machine Type: n1-standard-2
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: monitoring
    • Taints:
      • nodeType=monitoring:NoExecute
prod1k-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled

      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes
    • Machine Type: n1-standard-4
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: messaging
      • serviceClass: prod1k
    • Taints:
      • nodeType=messaging:NoExecute
      • serviceClass=prod1k:NoExecute
prod10k-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled
      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes
    • Machine Type: n1-standard-8
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: messaging
      • serviceClass: prod10k
    • Taints:
      • nodeType=messaging:NoExecute
      • serviceClass=prod10k:NoExecute
prod100k-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled
      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes
    • Machine Type: n1-standard-16
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: messaging
      • serviceClass: prod100k
    • Taints:
      • nodeType=messaging:NoExecute
      • serviceClass=prod100k:NoExecute

Storage Class

The cluster requires a storage class that can create SSD-based Persistent Volume Claims (PVCs) with XFS as the file system type. To support scale-up, the StorageClass must contain the allowVolumeExpansion property, and have it set to "true".

After creating the cluster, create the storage class using the reference storage class yaml found in the reference GKE Terraform available on GitHub.

For more information about recommended storage classes, see Supported Storage Solutions.

Autoscaling

Your cluster requires autoscaling to provide the appropriate level of available resources for your event broker services as their demands change. Solace recommends using the Kubernetes Cluster Autoscaler, which you can find in the Kubernetes GitHub repository at: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler.

See the Autoscaling a cluster documentation on the Google Kubernetes Engine (GKE) documentation site for information about implementing a Cluster Autoscaler.