Installing PubSub+ Cloud in Amazon Elastic Kubernetes Service (EKS)

Amazon Elastic Kubernetes Service (Amazon EKS) gives you (the customer) the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises. Amazon EKS helps you provide highly-available and secure clusters and automates key tasks such as patching, node provisioning, and updates. For more information about EKS, see the Amazon EKS documentation.

Depending on the Kubernetes distribution you choose, there are a number of environment-specific steps that you must perform to install PubSub+ Cloud.

Before you perform the environment-specific steps described below, ensure that you review and fulfill the general requirements listed in Common Kubernetes Prerequisites.

  1. Create a Kubernetes cluster. For customer-owned deployments, you are responsible for the set up of the Kubernetes cluster and the maintenance and operation of the cluster. The following information can help you to understand the requirements of that Kubernetes cluster that you create:

    Available from Solace are sample scripts and Terraform modules you can use as reference example to understand what is required in the Kubernetes cluster. The example is provided as-is. You (the customer) can modify the files as required to create your Kubernetes cluster. If you choose to do so, then you are responsible to maintain and modify the files for your deployment. Contact Solace for more inforamtion.

Amazon EKS Prerequisites

The following are the technical prerequisites for an Amazon EKS deployment to deploy event broker services:

VPC Security Group
Before you begin, you must open an AWS support ticket and request an increase to the Rules per VPC Security Group to 200 for the region you intend to deploy your EKS. This increase is required for event broker services to support the default protocols, which are as follows:
  • SSH
  • WEB-messaging
  • SEMP
  • AMQP TLS
  • MQTT web tls
  • MQTT TLS
  • REST
  • SMF/TLS
  • Load Balancer
Permissions
An AWS account with the following permissions. These permissions are required only by the individual when deployment is done using a Terraform module:
  • All the permissions that are required to create and manage the EKS cluster (eksClusterRole).
  • Permission to create IAM roles and IAM policies in the EKS cluster. These permissions are required by the Terraform module. The example module creates these IAM roles and policies that are used by the EKS cluster, and the following permissions to create and manage resources in the EKS cluster:
    IAM Role
    Gives permissions to the following:
    • EKS cluster control plane
    • EKS cluster worker nodes
    • EKS Load balancer controller
    • EKS auto-scaler
    IAM Policy
    Creates a set of permissions that the is required by the following in the deployment:
    All EC2 resources
    The Kubernetes and Terraform modules require this permission to access tags and the metadata of resources. The autoscaler and Load Balancer provisioning look at what security group each instance has, and modifies the security groups to add rules for the load balancer services.
    VPC
    The Terraform module requires this permission to create the VPC. The Kubernetes module also requires this permission to scan VPC and subnets to retrieve networking parameters.
    Elastic Load Balancers
    Permission is required for the Kubernetes module to create ELBs for the Load Balancer services. The Load Balancer services requires rules that are added in the Security Groups mentioned below. Additional permissions required by the AWS Load Balancer Controller and must be updated to match those specified at https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.2.0/docs/install/iam_policy.json.
    EBS
    Permission is required for Kubernetes dynamic PVCS that require access to EBS in order to dynamically create values and attach on the correct host. The host also requires EC2 access as described above.
    Security Groups
    The Kubernetes module requires this permission to create security groups. The load balancer services also requires rules added as well.
    Routing tables
    The Terraform module requires this permission to create routing tables.
    Internet Gateways
    Terraform module requires this permission to create an Internet gateway.
    Elastic IPs
    The Terraform module requires this permission to attach Elastic IP addresses (EIPs) to the NAT gateway.
    NAT Gateways
    The Terraform module requires this permission to create the NAT gateway and attach EIPs to the NAT gateway.
    OIDC Provider
    The Terraform module requires permission to create an OIDC provider that will be used to authenticate Kubernetes modules against IAM roles. The Auto-scaler and AWS Load Balancer Controller are two of the Kubernetes modules that use OIDC to authenticate the IAM role.
Networking
You must create an Elastic IP address (EIP) for each NAT Gateway that you intend to use with the following considerations:
  • The EIPs for the NAT gateways must be created upfront.
  • Solace recommends two EIPs are created. A minimum of one EIP allocation ID is required.

Considerations for Deploying PubSub+ Cloud on an Amazon EKS Cluster

You (the customer) should consider the following limitations and recommendations regarding your Amazon EKS cluster for a private data center deployment of PubSub+ Cloud:

  • A minimum size of /22 is required for an EKS cluster that's dedicated to event broker services. A larger size is required if the cluster contains other services. For more information, see IP Range.

  • Solace recommends that you have two NAT Gateways and two EIPs for redundancy. You can choose to use one NAT Gateway with one EIP with the consideration that all event broker service features [e.g., VPN bridges, Dynamic Message Routing (DMR), Disaster Recovery (DR), REST Destination Points (RDP)] that rely on external connections will fail to function if the zone of the NAT Gateways fails.
  • Solace recommends that you have a least two Bastion hosts, and that these hosts are spread out evenly over the three Availability zones. This recommendation allows you to remotely access the deployment should a zone fail. It's important to note that remote access to your deployment if a zone fails is not a requirement; you can choose to use the minimum of one Bastion host instead.
  • We recommend that you use the version of the AWS Load Balancer Controller from Solace for your EKS cluster. Solace's version of the AWS Load Balancer Controller optimizes the ingress rules. The default (non-optimized) ingress rules in the default AWS Load Balancer Controller limit a deployment to eleven event broker services.

Considerations for Deployments in China Regions

Additional considerations are required if the private region you deploy to is within China.

  • You must have a separate Amazon Web Services (China) Account, which is a set of credentials that are distinct and separate from the Amazon Web Services Global Accounts. To register an account, go to http://www.amazonaws.cn.
  • Customer-controlled deployments in China require that you provide a custom domain name registered in China (for example, *.mycompany.cn) with certificates issued for that domain. Event broker services are accessed using the custom domain name rather than *.messaging.solace.cloud. For more information, see Deployments in China.

  • Instead of the GCP registry, you must use the Azure China registry. The Azure China registry uses a different secret that's required when you later deploy the Mission Control Agent. For example, run the following command where the <username> and <password> is provided by Solace.

    kubectl create secret docker-registry cn-reg-secret --namespace kube-system \ 
    --docker-server=solacecloud.azurecr.cn --docker-username=<username> --docker-password=<password>

All other EKS clusters specifications are the same as described in EKS Cluster Specifications.

EKS Cluster Specifications

Before you (the customer) install the Mission Control Agent, you must configure the EKS cluster with the technical specifications listed in the following sections:

For more detailed information about using Amazon EKS, see the User Guide on the Amazon EKS documentation site.

Node Groups

The Kubernetes cluster autoscaler must use two node groups. The nodegroups can be configured to start from zero instances, which means that it has a full set of node groups for each scaling tier without requiring that instances run in each.

The node groups must be configured to start at 0 . Hints should be provided to the autoscaler as to which labels and taints are set on the worker nodes by using stags on the auto-scaling group (ASG). There is no mechanism available to tag the ASG when you create the node groups, but can be accomplished as described in Managed Nodes Scale to Zero and Cluster Autoscaler does not start new nodes when Taints and NodeSelector are used in EKS.

Instance Type Requirements

Because of the additional resources required to run Kubernetes, the instance types that are required for some of the scaling tiers are larger than their instance-based cousins. the following are the instance size type requirements for an EKS. For details about the core and RAM requirements for each scaling tier, see Resource Requirements for Kubernetes.

Scaling Tier Instance Type Required
Monitor T3.medium
Developer R5.large
Enterprise 250 R5.large
Enterprise 1K R5.large
Enterprise 5K R5.xlarge
Enterprise 10K R5.xlarge
Enterprise 50K R5.2xlarge
Enterprise 100K R5.2xlarge

Bastion Host

Solace recommends that you have at least two Bastion hosts, and that these hosts are spread out evenly over the three Availability zones. This allows you to remotely access your deployment should a zone fail; if remote access to your deployment if a zone fails is not a requirement, you can choose to use the minimum of one Bastion host instead to manage the EKS cluster. The Bastion host is deployed using a minimal VM that runs only an SSH server.

Storage Class

The EKS storage class (type) can use either GP2 or GP3. Consider the following when choosing which storage class to use:

  • With GP2, the performance improves as the size of the disk increases. In cases where the your disk size is greater than 1 TB, using GP2 is a better option than GP3.

  • With GP3, the default performance is 3k IOPS regardless of disk size. GP3 is recommended for disk sizes that are less than 1TB. For cases where the disk size is greater than 1 TB, GP2 is the better choice unless you configure the GP3 storage class with a provisioned IOPS that's higher than 3k. It's important to note that if you increase the IOPS on the storage class, then all event broker services get the same IOPS. For example, if you increase to 6k IOPS, then all event broker services get 6k IOPS disks and you may incur an extra 3k IOPs costs as a result of doing this.

It's important remember that the disk size (the size of the EBS volume) is twice the Message Spool size specified when you create an event broker service. For example, if you configure an event broker service to use a Message Spool of 500 GB, you require a 1 TB disk size and this should be a considered as part of your planning when you create the Kubernetes cluster.

To deploy PubSub+ Cloud, the StorageClass in EKS must be configured to use the WaitForFirstConsumer binding mode (volumeBindingMode). To support scale-up, the StorageClass must contain the allowVolumeExpansion property, and have it set to "true". If an optional AWS KMS (CMK) key is used, that key must be provided to Solace. The KMS key can be applied to the cluster during or after creation of the Kubernetes cluster. You should always use XFS  as the filesystem type (fsType).

The properties of your StorageClass yaml should be similar to the following:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
 annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: gp2
  selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
parameters:
  encrypted: "true"
  fsType: xfs
  type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

For more information about creating a storage class for EKS, see Supported Storage Solutions on the Amazon EKS User Guide site.

Networking

To spread an event broker service, or rather the event brokers over the Availability Zones (AZ), an anti-pod affinity should be used. When the region has multiple availability zones, the topologyKey key should be set to kubernetes.io/zone, whereas when AZs not are available, set it to topology.kubernetes.io/hostname.

NAT Gateway

You require one Elastic IP (EIP) for each NAT gateway for your cluster. Solace recommends that you have two Elastic IPs (and two NAT gateways) for a production system.

You can have up to three EIPSs and NAT gateways, which allows you to have multi-AZ NAT redundancy . This requires that you have three EIPs. These NAT EIPs must be created upfront. If you use a Terafom module, ensure you use the EIPs.

Load Balancer

Configure Solace's version of the AWS Load Balancer Controller v2.4.1 for NLB, which has optimized ingress. Though NLB-IP (the AWS load balancer) can be used, fail-over times when using NLB-IP isn't acceptable for high-availability, production environments. The permissions in the role used by the AWS Load Balancer Controller must be updated to match those specified at https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.2.0/docs/install/iam_policy.json.

We recommend that you use Solace's version of the AWS Load Balancer Controller so that your deployment is not limited to eleven event broker services. To use Solace's version of the AWS Load Balancer Controller, contact Solace when you are ready to deploy set up your Kubernetes cluster (helm install) to get the necessary repository information and credentials.

Solace has released several versions of the AWS Load Balancer Controller with each version having compatibility with specific Kubernetes releases as follows:

Solace-specific Release of AWS Load Balancer Controller (tag) Compatibility with Kubernetes Versions Link to Permissions for Role 

v2.2.4-nlb

1.19, 1.20, 1.21

https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.2.4/docs/install/iam_policy.json.

v2.4.1-nlb

1.20 and later

https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.1/docs/install/iam_policy.json.

We also recommend that you use the following service annotations for the  Load Balancer Controller when you deploy it:

serviceAnnotations:
- service.beta.kubernetes.io/aws-load-balancer-type: "external"
- service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "instance"
- service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing" # remove this one for internal (private) services

IP Range

A minimum size of /22 is required for an EKS cluster that's dedicated to event broker services. A larger size is required if the cluster contains other services.

Native CNI allocates IP directly from the VPC’s subnet. Because EKS uses native CNI, the EKS infrastructure allocates the blocks of IP addresses. The number of blocks that are allocated is directly proportional to the number of worker nodes, which in turn is also proportional to the number of event broker services.

These formulas can be used to determine the Primary IP subnet size (or IP address usage) for a cluster where:

  • number_of_broker_services is the number of event broker services
  • number_broker_services_with_internal_lb is the number of event broker services with an internal load balancer
  • number_broker_services_with_public_lb is the number of event broker services with an public load balancer

For the number of private subnet IP address usage:

6 + 20 x number_of_broker_services + number_broker_services_with_internal_lb

For the number of public subnet IP address usage:

8 + number_broker_services_with_public_lb

In the formula, the value of 20 IPs addresses per event broker service accounts for the IP addresses used for the worker nodes and overhead required for underlying default infrastructure. The event broker service itself has three event brokers in a HA configuration (primary, backup, or monitor) where each event broker must runs in its own worker node.

Currently, the limit for event broker services in the cluster is eleven (due to the VPC rules limit), which means private subnets that are dedicated to event broker services require 237 IP addresses at most (i.e., 6 + 20 x 11 = 237) , and for public subnets, the upper limit is 19 IP addresses (i.e., 8 +  11 = 19).

For this reason, the minimum CIDR size of /22 is recommended to accommodate the number of IP addresses required to support the maximum number of event broker services. The CIDR of /22 can be separated four /24 blocks to support both private and public subnets. For example, you could use the three first /24 blocks of the  /22 for private subnets; while the last /24 can contain /26 blocks for public subnets such as:

PrivateSubnet1 -> 10.0.0.0/24
PrivateSubnet2 -> 10.0.1.0/24
PrivateSubnet3 -> 10.0.2.0/24
PublicSubnet1 -> 10.0.3.0/26
PublicSubnet2 -> 10.0.3.64/26
PublicSubnet3 -> 10.0.3.128/26

Native CNI allocates IP directly from the VPC’s subnet. Because EKS uses native CNI, the EKS infrastructure allocates the blocks of IP addresses. The number of blocks that are allocated is directly proportional to the number of worker nodes, which in turn is also proportional to the number of event broker services.

Autoscaling

Your cluster requires autoscaling to provide the appropriate level of available resources for your event broker services as their demands change. Solace recommends using the Kubernetes Cluster Autoscaler, which you can find in the Kuberenetes GitHub repository at: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler.

See the Autoscaling documentation on the Amazon EKS documentation site for information about implementing a Cluster Autoscaler.