Installing PubSub+ Cloud in Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) is a secure and fully managed Kubernetes service that maximizes your operational efficiency. Google takes care of the underlying infrastructure of your entire cluster, including nodes. For more information about GKE, see the Google Kubernetes Engine documentation.

This deployment guide is intended for customers installing PubSub+ Cloud in a Customer-Controlled Region. For a list of deployment options, see PubSub+ Cloud Deployment Ownership Models.

There are a number of environment-specific steps that you must perform to install PubSub+ Cloud.

Before you perform the environment-specific steps described below, ensure that you review and fulfill the general requirements listed in Common Kubernetes Prerequisites.

Solace does not support event broker service integration with service meshes. Service meshes include Istio, Cilium, Linkerd, Consul, and others. If deploying to a cluster with a service mesh, you must:

  • exclude the target-namespace used by PubSub+ Cloud services from the service mesh.
  • set up connectivity to event broker service in the cluster using LoadBalancer or NodePort. See Exposing Event Broker Services to External Traffic for more information.

Solace provides reference Terraform projects for deploying a Kubernetes cluster to AKS, EKS, and GKE. These Terraform projects have the recommended configuration settings, such as worker node sizes, resource configurations, taints, and labels optimized to install PubSub+ Cloud.

You can download the reference Terraform projects from our GitHub repository: https://github.com/SolaceLabs/customer-controlled-region-reference-architectures

Beware that all sample scripts, Terraform modules, and examples are provided as-is. You can modify the files as required and are responsible for maintaining the modified files for your Kubernetes cluster.

The steps that you (the customer) perform are as follows:

  1. Create the cluster as described in GKE Cluster Specifications.

GKE Cluster Specifications

Before you (the customer) install the Mission Control Agent, you must configure the GKE cluster with the technical specifications listed in the sections that follow which include these areas:

When created with these specifications, the GKE cluster has multiple node pools, and is designed to be auto-scaled when new event broker services are created. Each node pool provides the exact resources required by each plan to help optimize the cluster's utilization.

If you have configured your GKE cluster to use Dataplane v2, you must use event broker service versions 10.3 and later. Dataplane v2 is an optimized dataplane for GKE clusters that offers some advantages, including scalability, built in security and logging, and consistency across clusters. For more information, see About Dataplane v2 in the Google Kubernetes (GKE) documentation. Solace does not support Dataplane v2 with event broker services prior to version 10.3.

Networking

The cluster requires a VPC with a subnet. We recommend a private cluster with public access to the master, which will also require a NAT gateway (created using Cloud NAT) set up to allow the pods on the worker nodes to access the internet to communicate with PubSub+ Cloud and DataDog.

For subnet IP address range sizing, there are three IP address ranges that must exist in the subnet to support a GKE cluster: the primary address range, which is used by the worker nodes in the cluster, a secondary range that is used for pods, and another secondary range that is used used for services. For more information, see Creating a VPC-native cluster in the Google Cloud documentation.

There are two options here:

  • If the VPC will not be peered with any other VPC, then only the primary IP range must be provided for the subnet and the secondary IP ranges will be auto-configured by GCP.
  • If the VPC will be peered, then the ranges must not conflict with the peered VPC(s), so it's best to provide the primary and both secondary ranges manually.

The specifications for the three IP address ranges are listed below.

Provided there is sufficient space, all address ranges can be increased in size at any time if more capacity is required.

Primary IP Range

To determine this required size of the primary IP address range, determine the total number of enterprise and developer services that are expected to be created in the cluster. For each enterprise (HA) services, 3 nodes are required, and for each developer 1 node is required. There are also 3 nodes in the system pool. So:

node count = 3 * enterprise + 1 * developer + 3 (for the system node pool)

If 10 enterprise service are required, and 10 developers, then 43 total nodes are required.

Once the required node count is determined, you can look up the required size of the primary IP range using this table in the Google documentation.

We always recommend that you err on the side of caution and make the IP range bigger than what is calculated to allow for any unplanned future expansion.

Secondary IP Range (Pods)

This calculation is required only if the secondary subnet IP ranges are not auto-generated by GCP.

The secondary IP address range for pods is based on the maximum pods per node, and the number of nodes in the cluster. If limiting the size of the IP range is a concern, we suggest limiting the pods per node in the node pools (except for the system node pool) to 16 (from 110), because PubSub+ Cloud event broker services require a fairly low pod count per node.

A full description of the calculations required to determine the pod secondary IP range can be found in the Google documentation.

The three system-node-pool nodes need a /24 range each for your default 110 pods, which means a minimum secondary range of /22 with 16 nodes per pod that will support up to 8 service worker nodes. This configuration allows up to 2 HA services or 8 developer services (or a combination of the two.)

A more reasonable range is /20 which would provide up to 104 worker nodes for messaging services.

Secondary IP Range (Services)

This calculation is required only if the secondary subnet IP ranges are not auto-generated by GCP.

This secondary IP range is for Kubernetes services. Each messaging service (enterprise or developer) creates two Kubernetes services, that is, two IP addresses, per messaging service.

Therefore, the total required number of Kubernetes services is simply twice the number of messaging services. The IP range can be determined from this table in the Google documentation.

Again, we suggest adding some padding to this range to allow for unplanned future expansion. There are at least 10 internal Kubernetes services that will need IP addresses.

Primary and Secondary IP Range Examples

The table below shows a few examples for IP range options for combinations of services expected in the cluster. The calculations expect the maximum pods per node to be 110 for the system pool and 16 for the auto-scaling pools. These values are the minimum subnet ranges—they can be larger. These ranges can also be increased after they're created (but they cannot be shrunk).

Enterprise (HA) Services Developer Services Node Count Primary IP Range Secondary IP Range (Pods) Secondary IP Range (Services)

10

0

33

/26

/21

/24

0

10

13

/27

/21

/24

10

10

43

/26

/20

/24

20

0

63

/25

/20

/24

50

20

173

/24

/19

/23

The secondary IP range is a concern only if the VPC will be peered to other VPCs. If that is not the case, we recommend that you use a Primary IP Range of /20 and allow GCP to auto-create the secondary IP ranges.

VPC and Subnet

Once any IP ranges have been calculated, the VPC and subnet can be created.

Create a VPC Network, then add a subnet to it. Both can be named whatever you choose. For the subnet, provide the primary IP range, and (if required) also the two secondary IP ranges calculated from above. The subnet's region must be the region you want the cluster in.

NAT Gateway

If the cluster is going to be a private cluster (which is our recommendation) then a NAT gateway must be set up to allow pods on the worker nodes access to the internet. This is required so the Mission Control Agent can communicate with the PubSub+ Home Cloud and our monitoring solution can ship metrics and logs. To do this, you use Cloud NAT. For more information about Cloud NAT, see the Google documentation.

After the VPC and subnet are set up, go to Cloud NAT and create a NAT Gateway, selecting the VPC and the region that the subnet is in.

Cluster

Once the VPC, subnet, and a NAT Gateway are configured, the cluster can be created.

Below are the settings for the cluster itself.

  • Basics
    • Location type: Regional (with 3 zones)
    • Stable version: Release channel
  • Networking
    • Private cluster: enabled
    • Access master using its external IP address: enabled
    • Master IP Range: /28 network, must be different than any in the VPC
    • Network: the network created above
    • Node subnet: the subnet created above
    • Enable network policy: enabled
    • Enable HTTP Load Balancing: enabled
    • Enable master authorized networks: disabled
  • Security
    • Enable Shielded GKE nodes: enabled
    • Enable workload identify: enabled

Node Pools

The following are the required settings for the node pools. These should be configured at the same time as the cluster.

system-node-pool
  • General
    • Number of nodes (per zone): 1
    • Autoscaling: disabled
  • Nodes
    • ImageType: Container-Optimized OS with containerd (cos_containerd)
    • Machine Type: n1-standard-4
    • Boot Disk Size: 100GB
monitoring-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled
      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes

    • Machine Type: n1-standard-2
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: monitoring
    • Taints:
      • nodeType=monitoring:NoExecute
prod1k-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled

      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes
    • Machine Type: n1-standard-4
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: messaging
      • serviceClass: prod1k
    • Taints:
      • nodeType=messaging:NoExecute
      • serviceClass=prod1k:NoExecute
prod10k-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled
      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes
    • Machine Type: n1-standard-8
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: messaging
      • serviceClass: prod10k
    • Taints:
      • nodeType=messaging:NoExecute
      • serviceClass=prod10k:NoExecute
prod100k-node-pool
  • General
    • Number of nodes (per zone): 0
    • Autoscaling: enabled
      • Minimum nodes: 0
      • Maximum nodes (per zone): 1000
  • Nodes
    • Machine Type: n1-standard-16
    • Image Type: Ubuntu with containerd (ubuntu_containerd)
    • Boot Disk Size: 100GB
    • Maximum Pods per node: 16
  • Metadata
    • Labels:
      • nodeType: messaging
      • serviceClass: prod100k
    • Taints:
      • nodeType=messaging:NoExecute
      • serviceClass=prod100k:NoExecute

Storage Class

The cluster requires a storage class that can create SSD-based Persistent Volume Claims (PVCs) with XFS as the file system type. To support scale-up, the StorageClass must contain the allowVolumeExpansion property, and have it set to "true".

After creating the cluster, create the storage class using the reference storage class yaml found in the reference GKE Terraform available on GitHub.

For more information about recommended storage classes, see Supported Storage Solutions.

Autoscaling

Your cluster requires autoscaling to provide the appropriate level of available resources for your event broker services as their demands change. Solace recommends using the Kubernetes Cluster Autoscaler, which you can find in the Kuberenetes GitHub repository at: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler.

See the Autoscaling a cluster documentation on the Google Kubernetes Engine (GKE) documentation site for information about implementing a Cluster Autoscaler.