Installing PubSub+ Cloud in Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS) simplifies deploying a managed Kubernetes cluster in Azure by offloading the operational overhead to Azure. As a hosted Kubernetes service, Azure handles critical tasks, like health monitoring and maintenance. For more information about AKS, see the Azure Kubernetes Service documentation.
This deployment guide is intended for customers installing PubSub+ Cloud in a Customer-Controlled Region. For a list of deployment options, see PubSub+ Cloud Deployment Ownership Models.
There are a number of environment-specific steps that you must perform to install
Before you perform the environment-specific steps described below, ensure that you review and fulfill the general requirements listed in Common Kubernetes Prerequisites.
Solace does not support event broker service integration with service meshes. Service meshes include Istio, Cilium, Linkerd, Consul, and others. If deploying to a cluster with a service mesh, you must:
- exclude the
target-namespace
used by PubSub+ Cloud services from the service mesh. - set up connectivity to event broker service in the cluster using LoadBalancer or NodePort. See Exposing Event Broker Services to External Traffic for more information.
Solace provide
You can download the reference Terraform projects from
Beware that all sample scripts, Terraform modules, and examples are provided as-is. You can modify the files as required and are responsible for maintaining the modified files for your Kubernetes cluster.
AKS Cluster Prerequisites
The following are the technical prerequisites for an AKS cluster deployment to deploy event broker services:
- Worker Nodes
- Worker nodes that you use for PubSub+ Cloud components must use ephemeral disks for the OS. Solace recommends that you (the customer) use ephemeral disks because Azure's premium managed disks don't provide the performance required for Kubernetes unless high-cost disks are utilized. When you use ephemeral disks for the OS, this permits the virtual machines to utilize a storage solution that is performant and cost-effective. Since worker nodes don't persist critical information on the disk that is used by the OS, there's no requirement to use non-ephemeral disks.
- Permissions
- The following permissions are required by the you to deploy the Terraform module:
- All the permissions that are required to create and manage the AKS cluster. These permissions can be delegated by the Terraform module you use.
- Permission to create Service Principals and assign the Contributor role over the whole resource group and the Network Contributor over the private subnets.
These permissions are given to the AKS cluster to create load balancers and configure them. In addition, these permissions allow the CNI to interact with subnets and route tables.
- The AKS-managed service (called AzureContainerService) is assigned a permission. Specifically, the AzureContainerService AD Application will get the Network Contributor role assigned to it over the entire resources group. The Terraform module requires this permission to read the NAT gateways configuration when it creates a cluster, and to configure networking as required by Azure-CNI.
- An Azure account. Permissions to create and manage the following resources are required for the Terraform module you create:
- All Virtual Machine resources
- The Terraform modules access to the VM resources to create the service principal, AD application and the AKS cluster. Note that the in the available example, the AKS cluster is created using the
data center-name
value from thevars.tf
file using the convention<data center-name>-aks
. - VNet
- The Terraform module requires this permission to configure the Virtual Network (VNet).
- Standard Load Balancers
- The Terraform module requires this permission to create and configure the load balancers.
- Premium LRS-managed Disks
- The Terraform module requires this permission to access LRS disks.
- Subnets
- The Terraform module requires this permission to set up the subnets.
- Security Groups
- The Terraform module requires this permission to set up the necessary security groups.
- Routing Tables
- The Terraform module requires this permission to create the appropriate gateways.
- Public IPs
- The Terraform module requires this permission to attach elastic IP addresses (EIPs) to the NAT gateway.
AKS Cluster Specifications
Before you (the customer) install the Mission Control Agent, you must configure the AKS cluster with the technical specifications listed in the sections that follow.
Node Pool Requirements
For high-availability event broker services, the cluster requires 12 node pools for event broker services. These must be split into four sets of three node pools. Each node pool must be locked to a single availability zone. Locking a node pool to an availability zone allows the cluster autoscaler to function properly. Solace uses pod anti-affinity against the node pools' zone label to ensure that each pod in a high-availability event broker service is in a separate availability zone.
For high-availability event broker services, the default (system) node pool spans all three availability zones.
The node pools must also meet the following requirements:
-
Configure the node pool settings where the OS disk type must be ephemeral and the OS disk size must be 48.
- AKS Worker nodes for monitoring must be a minimum of Standard_DS2s_v3. The following table shows the minimal Worker Node size required based on the largest plan that's supported for your deployment.
Node Pool Type Recommended Minimum VM Size Number of Worker nodes Required Monitoring Standard_D2s_v3 One for each service (the sum of all services of all types) Up to Enterprise 1K (Kilo) Standard_E2s_v3 Two for each Enterprise 1K service Up to Enterprise 10K (Giga) Standard_E4s_v3 Two for each Enterprise 10K service Up to Enterprise 100K (Tera 100k) Standard_E8s_v3 Two for each Enterprise 100K service
Storage Class
For AKS, an autoscaler is included and the deployment script creates a storage class named managed-premium-zoned
. The storage class works with PubSub+ Cloud to provide the following:
- Local Redundant Storage (LRS) redundancy. Solace requires an LRS disk because other types of redundancy are too slow. The PubSub+ Cloud Enterprise plans use high-availability services that replicate data across two LRS disks.
- Solace requires block device-based storage as regular filesystem-based storage won't work with the event broker service. As such, Solace requires managed volumes instead of azurefile volumes.
- The event broker services are designed to use the XFS file system. The
fsType
setting must be set toxfs
to ensure the event broker services meet their required performance levels. - To support scale-up, the
StorageClass
must contain theallowVolumeExpansion
property, and have it set to "true
". - To deploy PubSub+ Cloud, a custom
StorageClass
in AKS is required so that the Persistent Volume Claims (PVC) process creates the volume in the same AZ where the pods are scheduled. This storage class uses Managed Premium LRS disks, and has theWaitForFirstConsumer
binding mode, which instructs PVC to wait for a pod to be scheduled before deciding which zones the disks are in.This storage class should have properties similar to the following example:
After creating the cluster, create the storage class using the reference storage class yaml found in the reference AKS Terraform available on GitHub.
Load Balancer
When using AKS, event broker services are exposed through a single public network load balancer. The source IP address for outgoing connections to internet hosts are static IP address. The static IP address that is used as a front-end public IP is associated with the AKS public Standard Load Balancer.
You must use a Standard Load Balancer SKU instead of a Basic load balancer SKU. The Standard Load Balancer is required to act as a NAT solution and avoids the requirement to segregate the AKS cluster into zonal stacks and the requirement for a separate NAT. The Standard Load Balancer also allows you to deploy the AKS cluster to a single, private subnet using a single route table, and simplifies the deployment and planning of CIDRs when using VNet peering technologies, such as Hub-spoke network topology in Azure.
Networking
-
To spread an event broker service, or rather the event broker services over three different Availability Zones (AZ), an anti-pod affinity can be used. For regions that support Availability Zones (AZ), set the
topologyKey
totopology.kubernetes.io/zone
; otherwize for AKS clusters that do not have AZ, set it tokubernetes.io/hostname
. -
Configure the IP addresses for the pods to be by AKS Kubernetes rather than VNet. Use
kubenet
(assigned by the cluster) for Kubernetes orazure
(assigned from the subnet). When you selectazure
, each worker node is pre-allocated 30 IP addresses from the VNet. Usingkubenet
does not assign pre-allocated IP addresses. -
Determine the number of outgoing SNAT (outgoing) ports for each VM in the cluster. The number you choose determines the number of worker nodes available; this must be between 0 and 64000. Choosing a lower number gives you more worker nodes, but fewer send connections per node. For more information about SNAT ports, see Outbound Rules Azure Load Balancer and Scenarios with outbound rules on the Microsoft website.
- The requirements are one pod per Developer service and three pods for other event broker services. The subnets need to be big enough to contain all the IP addresses used for the pods. For more information, see Configure Azure CNI networking in Azure Kubernetes Service (AKS) on the Microsoft website.
For information about using Azure Kubernetes Service, see the Azure documentation site.
IP Range
There are two networking options in Azure: Kubenet and Azure CNI. For Customer-Controlled Regions, Solace recommends using Kubenet. Kubenet offers the most efficient CIDR requirements for the VNet containing the cluster.
CIDR requirements for Dedicated Regions depend on your need to access event broker services privately through peering. If this is necessary, Solace requires a CIDR that is compatible with your network plan.
You should carefully consider future expansion requirements when estimating the CIDR size required for your AKS cluster. Once deployed, you cannot change the CIDR and expanding the size of your VNet is not simple. For more information, see AKS Cluster Specifications. You can also use the Solace provided
Deployments in Regions With No Availability Zones
Some regions do not have availability zones (AZs). You can deploy to these regions, but the IaaS has a reduced fault tolerance without AZs.
To deploy to regions that don't have AZs available, in the anti-pod affinity, the topologyKey
key should be set to kubernetes.io/hostname
, whereas when AZs are available, it is set to topology.kubernetes.io/zone
.
Autoscaling
Your cluster requires autoscaling to provide the appropriate level of available resources for your event broker services as their demands change. Solace recommends using the Kubernetes Cluster Autoscaler, which you can find in the Kuberenetes GitHub repository at: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler.
See the Automatically scale a cluster to meet application demands on Azure Kubernetes Service (AKS) documentation on the Microsft Azure Kubernetes Service (AKS) documentation site for information about implementing a Cluster Autoscaler.