AWS MSK Instance resizing failure

0

We have been trying to upgrade our AWS MSK cluster broker from kafka.t3.small → kafka.m7g.large. But it seems to go in the updating state for around 2-3 hours and then fail with InternalServerError.Unknown error code and The last operation failed because of a service issue. Retry the operation. error message. We have tried 3 times, but its same everytime. Any idea on why is this failing and how can we resolve this? Enter image description here

2 Answers
0

It's worth to remind how the Kafka upgrade takes place...

During the rolling update, one broker goes down at a time and remaining brokers will be available to perform the produce consume operations if you are following the MSK High Availability best practices.

When updating your cluster broker size from kafka.t3.small to kafka.m7g.large, be aware that migrating to a larger broker size can increase performance but may cost more. The broker-size update happens in a rolling fashion while the cluster is up and running. This means that Amazon MSK takes down one broker at a time to perform the broker-size update.

AWS recommends a certain number of partitions (including leader and follower replicas) per broker based on the broker size. If the number of partitions per broker exceeds the recommended value, your cluster might become overloaded, preventing certain operations, including updating the cluster to a different broker size

I would suggest though increasing the number of brokers and check the connection limits of the same.

profile picture
EXPERT
answered a month ago
0

I suspect your issue is because you are changing architectures from kafka.t3.small (x86_64) → kafka.m7g.large (ARM64).

Try updating from kafka.t3.small → kafka.m7i.large -- note m7i (Intel) vs m7g (Graviton/ARM).

I apologize, there was an announcement that x86 to Graviton is now supported.
https://aws.amazon.com/about-aws/whats-new/2024/06/amazon-msk-upgrades-m5-t3-instance-graviton3-m7g/

I would suggest opening a support case since this is a very new feature.

Hope this helps!

profile pictureAWS
EXPERT
iBehr
answered a month ago