Issue streaming response from bedrock agent

I would like to stream a response from a bedrock agent to the user. I'm working in python with boto3 and AgentsforBedrockRuntime. Under the hood, agents use InvokeAgent API, which is not build for streaming. Are there any temporary solutions to solve this issue? Is the bedrock team considering implementing this in the near future? Is there a way to see the roadmap for bedrock?

I think this post (not mine) articulates the issue well.

Thanks in advance!

Topics

Machine Learning & AI Generative AI on AWS

Tags

Amazon Bedrock

Language

English

Evan

asked a month ago254 views

2 Answers

Newest
Most votes
Most comments

Are these answers helpful? Upvote the correct answer to help the community benefit from your knowledge.

You could consider using AWS Lambda’s response payload streaming feature, which allows functions to progressively stream response payloads back to clients. This can be particularly useful when working with AI models that support streaming. If you’re working with Python, you might need to create a custom runtime for AWS Lambda, as response streaming is not directly supported on Python runtime.

Here is the doc to the API https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_InvokeAgent.html

EXPERT

Giovanni Lauria

answered a month ago

Alex_T EXPERT
a month ago
I don't think adding a layer in front of Bedrock that receives the whole response and then streams it out would really help - it only adds latency if the layer doesn't already exist, and if a Lambda is already present in the architecture the Bedrock should really be consuming the clear majority of the overall latency - wouldn't expect the transmission from Lambda to client to be significant.

An important thing to consider about the agent pattern is that the final response is typically a product of multiple LLM calls, often chained together so the output of one was used in the input to the next.

This substantially reduces the value of response streaming for agents versus the plain LLM-calling ConverseStream and InvokeModelWithResponseStream APIs: Because only the last generation in the agent flow can be meaningfully streamed so the client is still waiting with no output through the intermediate steps.

I can't really comment on roadmap or timelines, but for potential alternatives with the current API I'd suggest maybe:

Testing faster models or optimizing or removing prompt steps in the agent to try and optimize response latency subject to your quality requirements (an automated testing framework like AWSLabs agent-evaluation might help you test these optimizations against a suite of example conversations)
Making more basic, UI-side changes to your application to reassure users that the model is working on a response: Like typing/thinking bubble, progress wheel, disabling the send button, etc...

Again, even if/when a streaming feature becomes available in this API I'd avoid assuming it'll be a massive change in perceived latency for your users - unless your agent is often outputting very long messages where even streaming the final generation in the chain would help.

EXPERT

Alex_T

answered a month ago

EXPERT

Giovanni Lauria

reviewed a month ago

Relevant content

Bedrock Titan Model Streaming /invoke-with-response-stream
ane
asked 4 months ago
Bedrock agents not answering to general responses
Accepted Answer
HumanBot
asked a month ago
Special symbols in bedrock agent response
Denis
asked 2 months ago
how to parse response for invoke-with-response-stream API from bedrock when calling the API directly
AWS-User-1802444
asked a year ago
How do I raise the priority of agent to agent or agent to queue transferred calls in Amazon Connect?
AWS OFFICIALUpdated 3 years ago
How do I calculate the Agent non-response metric in Amazon Connect?
AWS OFFICIALUpdated a year ago
How do I use the unified CloudWatch agent to troubleshoot log timestamp issues?
AWS OFFICIALUpdated 5 months ago
How do I run the CodeDeploy agent with a user profile that's not the root profile?
AWS OFFICIALUpdated 5 days ago
Bringing Generative AI to the data warehouse with Amazon Bedrock and Amazon Redshift
EXPERT
Rick Fraser
published 6 months ago
ExecBrief - A RAG App powered by Bedrock
EXPERT
SriniV
published 3 months ago