AWS Bedrock Claude 3.5 Sonnet throttled randomly "Too many tokens, please wait before trying again."

0

Hi, I've been using Claude 3.5 Sonnet regularly when all of a sudden the requests stopped going through and I got this error:

botocore.errorfactory.ThrottlingException: An error occurred (ThrottlingException) when calling the ConverseStream operation (reached max retries: 4): Too many tokens, please wait before trying again.

I've tried on the model page on the website and it seems that just Claude 3.5 sonnet is unavailable. I'm aware that there is a shared compute pool for processing On Demand requests but since Provisioned Throughput isn't available for claude 3.5 sonnet yet this is the only option. Do I just have to weather these service disruptions every single time?

cpthdk
asked a month ago206 views
2 Answers
1

Hello.

I've tried on the model page on the website and it seems that just Claude 3.5 sonnet is unavailable. I'm aware that there is a shared compute pool for processing On Demand requests but since Provisioned Throughput isn't available for claude 3.5 sonnet yet this is the only option. Do I just have to weather these service disruptions every single time?

I think you're right.
As of July 2024, provisioned throughput mode cannot be purchased for Claude 3.5 Sonnet, so when a throttling error occurs, I think it would be necessary to accept the error or try again after a while.
https://docs.aws.amazon.com/bedrock/latest/userguide/pt-supported.html

profile picture
EXPERT
answered a month ago
profile picture
EXPERT
reviewed a month ago
profile pictureAWS
EXPERT
reviewed a month ago
  • If this is the case how do I deploy anything I make with Claude 3.5 Sonnet on AWS Bedrock for production?

  • When using it in a production environment, I think it would be a good idea to allow a period of time and then retry when a throttling error occurs.

0

Hi,

Based on documentation https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html current inference quota for Sonnet v3.5 is 50 reqs and 400'000 tokens per minute (not adjustable).

So, yes, after that, expect Throttling exceptions that you have to manage by retries: I have that in one application that I work on currently.

Best,

Didier

profile pictureAWS
EXPERT
answered a month ago
profile picture
EXPERT
reviewed a month ago