Partition Rebalancing

Partition rebalancing refers to the process whereby the event broker updates the mapping of partitions to flows for a partitioned queue. Rebalancing involves reassigning partition-to-flow mappings such that flows are distributed evenly across all partitions, with each partition assigned to a single flow (but note that a flow can be assigned more than one partition).

A partitioned queue is considered balanced if all active consumer flows are assigned to the same number of partitions (plus or minus one).

The following changes trigger rebalancing:

  • The queue's operating partition count changes. This happens if the partition count increases or decreases, or if the parent queue access type changes from exclusive to non-exclusive. The broker triggers rebalancing after it has finished the Partition Scaling process.

  • A new consumer binds to a partitioned queue that has fewer consumers than partitions. If there are already excess consumers, a new consumer bind does not trigger rebalancing.

  • A consumer unbinds from a partitioned queue that does not have excess consumers. If there are excess consumers, one of them becomes active immediately and takes over the load from the recently unbound consumer.

    No distinction is made between an unbind signaled by the client application and an unbind resulting from a disconnect.

A partitioned queue has the following properties:

  • partition count—the number of partitions. Partitions are numbered from 0 to N-1, where N is the number of partitions (a queue with four partitions has partitions numbered from 0 to 3).
  • rebalance delay—the delay in seconds to allow the number of consumers to stabilize before a rebalance activity is started after being triggered.

    If the number of consumers returns to its previous state before the delay timer expires, no partition handoffs occur.

    In an auto-scaling scenario, this specifies the maximum delay between the first and last consumer instance change.

    If the partition count changes, the rebalance is started immediately.

  • rebalance max-handoff-time—the maximum number of seconds the broker waits to allow consumers to acknowledge outstanding messages during Partition Handoff.

The rebalancing process resolves discrepancies within the following properties of the partitioned queue:

  • the partition count configured for the queue
  • the current set of partitions
  • the set of active consumer flows, each of which has one or more partitions assigned to it

When rebalancing starts, some mappings might be incomplete:

  • a partition might not be assigned to a flow (if new partitions have been created due to an increase in partition count)
  • one or more flows might be temporarily mapped to an invalid partition (if partitions were just deleted due to a decrease in partition count, or if a consumer has just disconnected), or not mapped to a partition at all (if new consumers were just added)

Rebalancing does not take into account queue depth or any other queue or flow metric.

To balance the queue's partitions, the event broker performs the following steps:

  1. Handle any change in the number of partitions (delete or create partitions, as required).

  2. Map any new partitions to consumer flows, preferring those flows that have fewest number of partitions mapped already.

  3. Send FlowChangeUpdate (remove the ActiveFlow indicator) to flows whose partitions have all been removed (flows remain active if only some of their assigned partitions are removed).

  4. Check for balance. Balance has been achieved if all flows have the same number of partitions assigned (plus or minus one).

  5. If the partitions are not balanced, hand off one partition from the flow with the greatest number of assigned partitions to the flow with the least.

  6. Repeat Step 1 through Step 6 until balance is achieved.

As an example, consider the following diagram. We start with a partitioned queue with seven partitions and three bound flows.

A diagram illustrating the concepts discussed in the surrounding text.

In a balanced state, each consumer flow has either two or three mapped partitions.

Suppose another consumer comes online, as shown in the next diagram:

A diagram illustrating the concepts discussed in the surrounding text.

This queue now has seven partitions and four consumer flows. It is unbalanced, because not all flows have the same number (plus or minus one) of partitions assigned to them. Here, the new flow has no partitions, whereas the other flows have either two or three.

After rebalancing, the resulting queue has three flows with two partitions, and one flow with one partition, as illustrated below.

A diagram illustrating the concepts discussed in the surrounding text.

Partition Rebalancing Status

A partitioned queue can have one of the following rebalancing statuses:

  • Ready—No rebalancing activity has been triggered.
  • Holddown—Rebalancing activity will start after the delay specified by the rebalance delay property of the queue.
  • Rebalancing—Rebalance logic is underway to adjust partition-to-flow mappings.