Installing PubSub+ Cloud in Amazon Elastic Kubernetes Service (EKS)

Amazon Elastic Kubernetes Service (Amazon EKS) gives you (the customer) the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises. Amazon EKS helps you provide highly-available and secure clusters and automates key tasks such as patching, node provisioning, and updates. For more information about EKS, see the Amazon EKS documentation.

This deployment guide is intended for customers installing PubSub+ Cloud in a Customer-Controlled Region. For a list of deployment options, see PubSub+ Cloud Deployment Ownership Models.

There are a number of environment-specific steps that you must perform to install PubSub+ Cloud.

  • Before you perform the environment-specific steps described below, ensure that you review and fulfill the general requirements listed in Common Kubernetes Prerequisites.

  • Solace does not support event broker service integration with service meshes. Service meshes include Istio, Cilium, Linkerd, Consul, and others. If deploying to a cluster with a service mesh, you must:

  1. Create a Kubernetes cluster. For customer-owned deployments, you are responsible for the set up of the Kubernetes cluster and the maintenance and operation of the cluster. The following information can help you to understand the requirements of that Kubernetes cluster that you create:
  2. Solace provides reference Terraform projects for deploying a Kubernetes cluster to AKS, EKS, and GKE. These Terraform projects have the recommended configuration settings, such as worker node sizes, resource configurations, taints, and labels optimized to install PubSub+ Cloud.

    You can download the reference Terraform projects from our GitHub repository: https://github.com/SolaceLabs/customer-controlled-region-reference-architectures

    Beware that all sample scripts, Terraform modules, and examples are provided as-is. You can modify the files as required and are responsible for maintaining the modified files for your Kubernetes cluster.

Amazon EKS Prerequisites

Deploying event broker services to Amazon EKS has the following technical prerequisites:

VPC Security Group

Before you begin, you must open an AWS support ticket and request an increase to the Rules per VPC Security Group to 200 for the EKS region you intend to deploy to. This increase is required for event broker services to support the default protocols, which are as follows:

  • SSH
  • WEB-messaging
  • SEMP
  • AMQP TLS
  • MQTT web tls
  • MQTT TLS
  • REST
  • SMF/TLS
  • Load Balancer

Permissions

You require certain permissions when deploying your EKS Kubernetes cluster. We have listed these permissions in two categories based on requirements of an individual using the Solace reference Terraform:

The user-based permissions are required by the user executing the reference Terraform to configure the EKS cluster. The auto-assigned permissions are provided automatically to the resources created by the reference Terraform. If you are not using the reference Terraform, you must ensure these permissions are assigned accordingly.

User-Required Permissions

These permissions must be assigned to the IAM user or role that executes the reference Terraform:

IAM Permissions
  • iam:CreateRole
  • iam:DeleteRole
  • iam:AttachRolePolicy
  • iam:DetachRolePolicy
  • iam:CreateInstanceProfile
  • iam:AddRoleToInstanceProfile
  • iam:RemoveRoleFromInstanceProfile
  • iam:DeleteInstanceProfile
  • iam:GetRole
  • iam:PassRole (particularly for EKS cluster role and node role)
EKS Creation Permissions
  • eks:CreateCluster
  • eks:CreateNodegroup
  • eks:DescribeCluster
  • eks:CreateAddon
  • eks:CreateAccessEntry
  • eks:AssociateAccessPolicy
  • eks:CreatePodIdentityAssociation
  • eks:DeletePodIdentityAssociation
Network Resource Permissions
  • ec2:CreateVpc
  • ec2:CreateSubnet
  • ec2:CreateInternetGateway
  • ec2:AttachInternetGateway
  • ec2:CreateNatGateway
  • ec2:CreateRouteTable
  • ec2:CreateRoute
  • ec2:AssociateRouteTable
  • ec2:AllocateAddress (for NAT Gateway EIP)
  • ec2:DescribeAvailabilityZones
  • ec2:CreateSecurityGroup
  • ec2:CreateSecurityGroupRule
KMS Permissions
  • kms:CreateKey
  • kms:ScheduleKeyDeletion
  • kms:TagResource
  • kms:CreateAlias
EC2 Instance Permissions
  • ec2:RunInstances
  • ec2:CreateLaunchTemplate
  • ec2:CreateKeyPair
  • autoscaling:CreateAutoScalingGroup
  • autoscaling:CreateOrUpdateTags
CloudWatch Permissions
  • logs:CreateLogGroup
  • logs:CreateLogStream
  • logs:PutLogEvents
  • logs:DescribeLogGroups
SSM Permissions
  • ssm:GetParameter (for retrieving EKS AMI information)

Auto-Assigned Permissions

These permissions are automatically handled by the reference Terraform by creating and assigning appropriate IAM roles to the AWS resources:

EKS Cluster Role

Created by Terraform and assigned to the EKS cluster:

  • AmazonEKSClusterPolicy
  • AmazonEKSServicePolicy
Worker Node Role

Created by Terraform and assigned to the worker nodes:

  • AmazonEC2ContainerRegistryReadOnly
  • AmazonEKSWorkerNodePolicy
  • AmazonSSMManagedInstanceCore
Bastion Host Role

Created by Terraform and assigned to the bastion host:

  • AmazonSSMManagedInstanceCore
Pod Identity Permissions

Created by Terraform and used by Kubernetes components:

Cluster Autoscaler
  • autoscaling:DescribeAutoScalingGroups
  • autoscaling:DescribeAutoScalingInstances
  • autoscaling:DescribeLaunchConfigurations
  • autoscaling:DescribeTags
  • autoscaling:SetDesiredCapacity
  • autoscaling:TerminateInstanceInAutoScalingGroup
  • ec2:DescribeLaunchTemplateVersions
AWS Load Balancer Controller
  • elasticloadbalancing:*
  • ec2:CreateSecurityGroup
  • ec2:DescribeSecurityGroups
  • ec2:DescribeInstances
  • ec2:DescribeSubnets
  • ec2:DescribeVpcs
  • iam:CreateServiceLinkedRole
  • iam:GetServerCertificate
  • iam:ListServerCertificates
  • acm:DescribeCertificate
  • acm:ListCertificates
  • For the most current and complete list of AWS Load Balancer Controller permissions, refer to the latest AWS Load Balancer Controller installation documentation.

EBS CSI Driver
  • ec2:CreateSnapshot
  • ec2:DeleteSnapshot
  • ec2:AttachVolume
  • ec2:DetachVolume
  • ec2:CreateVolume
  • ec2:DeleteVolume
  • ec2:DescribeVolumes
  • ec2:DescribeSnapshots
  • ec2:DescribeVolumesModifications
  • ec2:ModifyVolume
VPC CNI
  • ec2:AssignPrivateIpAddresses
  • ec2:AttachNetworkInterface
  • ec2:CreateNetworkInterface
  • ec2:DeleteNetworkInterface
  • ec2:DescribeInstances
  • ec2:DescribeNetworkInterfaces
  • ec2:DetachNetworkInterface
  • ec2:ModifyNetworkInterfaceAttribute
  • ec2:UnassignPrivateIpAddresses
  • ec2:DescribeSubnets
  • ec2:DescribeVpcs
KMS Permissions (Auto-assigned)

Created by Terraform and used by EKS and CloudWatch:

  • kms:GenerateDataKey
  • kms:Encrypt
  • kms:Decrypt
  • kms:ReEncrypt*
  • kms:DescribeKey

Networking

The Solace reference Terraform creates three NAT gateways, one per availability zone. The Terraform also creates three elastic IP (EIP) addresses for the NAT gateways.

Using EIPs is optional. If you don't want your event broker service to communicate with the public internet and or to only route traffic over your on-premises network, you may not need EIPs.

If you intend to use the reference Terraform and don't want to use EIPs, or want to use existing EIPs you may have already created, you should modify the Terraform to meet your requirements.

Considerations for Deploying PubSub+ Cloud on an Amazon EKS Cluster

You (the customer) should consider the following limitations and recommendations regarding your Amazon EKS cluster for a private data center deployment of PubSub+ Cloud:

  • A minimum size of /24 is required for an EKS cluster that's dedicated to event broker services. A larger size is required if the cluster contains other services. For more information, see IP Range.

  • The Solace reference Terraform recommends that you have three NAT Gateways and three EIPs for redundancy. You can choose to use one NAT Gateway with one EIP with the consideration that all event broker service features [e.g., VPN bridges, Dynamic Message Routing (DMR), Disaster Recovery (DR), REST Destination Points (RDP)] that rely on external connections will fail to function if the zone of the NAT Gateways fails.

Considerations for Deployments in China

There are additional considerations for deployments to Kubernetes clusters in China. For more information, see Deployments in China.

EKS Cluster Specifications

Before you (the customer) install the Mission Control Agent, you must configure the EKS cluster with the technical specifications listed in the following sections:

For more detailed information about using Amazon EKS, see the User Guide on the Amazon EKS documentation site.

Node Groups

The Kubernetes cluster autoscaler must use two node groups. The nodegroups can be configured to start from zero instances, which means that it has a full set of node groups for each scaling tier without requiring that instances run in each.

The node groups must be configured to start at 0 . Hints should be provided to the autoscaler as to which labels and taints are set on the worker nodes by using stags on the auto-scaling group (ASG). There is no mechanism available to tag the ASG when you create the node groups, but can be accomplished as described in Managed Nodes Scale to Zero and Cluster Autoscaler does not start new nodes when Taints and NodeSelector are used in EKS.

Instance Type Specifications

The following table lists the instances Solace uses for Dedicated Regions and recommends for Customer-Controlled Regions. Using less performant instances can reduce the performance and stability of your event broker service. Solace recommends a one-to-one relationship between your event broker service pods and worker nodes to provide sufficient resources and networking bandwidth for optimal event broker service performance.

Scaling Tier Instance Type Specifications
Monitor T3.medium
System M5.large
Developer R5.large
Enterprise 250 R5.large
Enterprise 1K R5.large
Enterprise 5K R5.xlarge
Enterprise 10K

R6in.xlarge

If the deployment region you selected does not offer R6in.xlarge, you must use R5.xlarge

Enterprise 50K

R6in.2xlarge

If the deployment region you selected does not offer R6in.2xlarge, you must use R5.2xlarge

Enterprise 100K

R6in.4xlarge

If the deployment region you selected does not offer R6in.4xlarge, you must use R5.4xlarge

Storage Class

The EKS storage class (type) can use either GP2 or GP3. Consider the following when choosing which storage class to use:

  • With GP2, the performance improves as the size of the disk increases. In cases where the your disk size is greater than 1 TB, using GP2 is a better option than GP3.

  • With GP3, the default performance is 3k IOPS regardless of disk size. GP3 is recommended for disk sizes that are less than 1TB. For cases where the disk size is greater than 1 TB, GP2 is the better choice unless you configure the GP3 storage class with a provisioned IOPS that's higher than 3k. It's important to note that if you increase the IOPS on the storage class, then all event broker services get the same IOPS. For example, if you increase to 6k IOPS, then all event broker services get 6k IOPS disks and you may incur an extra 3k IOPs costs as a result of doing this.

It's important remember that the disk size (the size of the EBS volume) is larger than the message spool size.
  • For event broker services 10.6.1 and earlier, the disk requirement is twice the Message Spool size specified when you create an event broker service. For example, if you configure an event broker service to use a Message Spool of 500 GiB, you require a 1 TB disk size.
  • For event broker services version 10.7.1 and later, the disk size requirement is 30% greater than the message spool size for the event broker service class. For example, the Enterprise 250 class has a message spool size of 50 GiB, requiring a 65 GiB disk size.
You must consider the disk space overhead when planning your Kubernetes cluster. See Volume Size for High-Availability Event Broker Services for a list of disk size requirements for all event broker service versions.

To deploy PubSub+ Cloud, you must configure the StorageClass in EKS to use the WaitForFirstConsumer binding mode (volumeBindingMode). To support scale-up, the StorageClass must contain the allowVolumeExpansion property, and have it set to "true". You should always use XFS  as the filesystem type (fsType).

After creating the cluster, create the storage class using the references storage class yaml examples in the reference EKS Terraform on GitHub:

Networking

To spread an event broker service, or rather the event brokers over the Availability Zones (AZ), an anti-pod affinity should be used. When the region has multiple availability zones, the topologyKey key should be set to kubernetes.io/zone, whereas when AZs not are available, set it to topology.kubernetes.io/hostname.

NAT Gateway

If you intend to allow your event broker services to access the public internet, Solace recommends at least one Elastic IP (EIP) for each NAT gateway for your cluster. The Solace reference Terraform recommends that you have three Elastic IPs (and three NAT gateways) for a production system. If you don't want to use NAT Gateways, you must provide an alternative for outbound communication for the Operational Connectivity required by your PubSub+ Cloud deployment.

Three EIPs and NAT gateways provides multi-AZ NAT redundancy . If you use a reference Terraform, these EIPs are created for you.

Load Balancer

PubSub+ Cloud requires the deployment of the AWS Load Balancer Controller to create Network Load Balancers (NLB) to front event broker services. You can find instructions for deploying the AWS Load Balancer Controlled at https://kubernetes-sigs.github.io/aws-load-balancer-controller

Solace configures the NLBs used in PubSub+ Cloud with IP targets and cross-zone enabled. This results in the fastest possible failover times.

Solace uses the following service annotations to configure the NLB:

service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing # this one is removed for internal (private) services
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true
service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: '2'
service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: '2'
service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: '5550'
service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: http
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /health-check/guaranteed-active
service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: '6'
service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: '10'

 

If you configured your EKS cluster using Solace's custom AWS Load Balancer configurations and require information about the custom AWS Load Balancer see Custom AWS Load Balancer for PubSub+ Cloud.

IP Range

A minimum size of /24 is required for an EKS cluster that's dedicated to event broker services. A larger size is required if the cluster contains other services.

Amazon's VPC Container Network Interface (CNI) allocates IPs directly from the VPC’s subnets. The number of IPs that are allocated is directly proportional to the number of worker nodes, which is also proportional to the number of event broker services.

The calculations below are based on custom settings for WARM_IP_TARGET and WARM_ENI_TARGET:

kubectl set env ds aws-node -n kube-system WARM_IP_TARGET=1     
kubectl set env ds aws-node -n kube-system WARM_ENI_TARGET=0

Details about these settings are available in the Amazon Kubernetes VPC CNI documentation on GitHub.

CIDR Calculator

You can use the Solace provided downloadable excel-based CIDR calculator to calculate your CIDR requirements.

Limitations

Currently, the limit for event broker services in a cluster using the standard aws-load-balancer-controller is 11 due to security group rule limits. You can avoid this limitation by using a modified aws-load-balancer-controller provided by Solace.

Autoscaling

Your cluster requires autoscaling to provide the appropriate level of available resources for your event broker services as their demands change. Solace recommends using the Kubernetes Cluster Autoscaler, which you can find in the Kubernetes GitHub repository at: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler.

See the Autoscaling documentation on the Amazon EKS documentation site for information about implementing a Cluster Autoscaler.