[RFC] A Fully decoupled and auto-scaled rollout engine using AWS Bedrock AgentCore Runtime #4216

luyuzhe111 · 2025-11-20T22:54:58Z

What does this PR do?

This PR implements a fully decoupled and auto-scaled rollout engine using AWS Bedrock AgentCore Runtime, making veRL highly agnostic to the diverse agentic use cases that often require custom scaffolding, multiple tools, and complex environments.

At a high level, we propose a design where developers run their whole agentic application with whatever customization they desire in a separate container managed by AgentCore on the cloud, instead of in the same environment as veRL on the training cluster. The design is illustrated by the following architectural diagram.

The agent application hosted on AgentCore Runtime communicates with veRL in two ways:

The agent invokes the proxy address (SGLang Router) in veRL to get response from the model (hosted by multiple vLLM/SGLang servers), just like how it invokes Bedrock/OpenAI/Anthropic API.
The agent sends the rollout and reward (implemented by developers) back to veRL for model updates.

Essentially, veRL sends a prompt to the rollout engine powered by AgentCore, and gets back a rollout and corresponding reward. All the rollout process (tool use, environment interaction, etc) happens on the cloud. This means developers don't have to migrate whatever agent application they've built to veRL to start training, while veRL doesn't have to anticipate all kinds of agentic use cases to accommodate in its design.

In addition to simplifying the developer experience and veRL architecture, AgentCore Runtime itself is also a perfect solution for generating rollouts. It will

create a separate sandboxed environment for each request, and
provide auto scaling so that one can submit a burst of requests without ever managing any infra.

AgentCore Runtime was originally designed as a deployment service for agent applications, and is repurposed in our design to generate rollouts scalably for RL training. We are also happy to learn recently that Cursor Composer training also adopts a similar design per the Ray Summit talk from @srush, where they leveraged Cursor Cloud agent to generate rollout for their large-scale RL training.

We think the solution in this PR can benefit both research projects and production scenarios. Under this paradigm, researchers and developers can focus on building their agentic applications with arbitrary frameworks, tools, and environments, whether for establishing a baseline or creating a deployable solution. Once they have a working agent and are ready for training, all they need to do on the veRL side is to provide a couple more configs (container URI, S3 bucket, etc). Of course they will still need to return the rollout and define the reward in their agent app, but we will release a sample repo with various agent examples soon to demonstrate how straightforward this process is. And when the training is done, the agent can be deployed with the exact harness and setup in the app so there is no mismatch between training and inference stage.

Co-authors of this PR: @luyuzhe111, @lyzustc, @hellodanylo.

Test

Unit tests are implemented in tests/experimental/agentcore_loop/test_basic_agentcore_loop.py. E2E training was tested for GRPO. vLLM was used as the inference engine.

API and Usage Example

Additional config args to the training script for any agent:

actor_rollout_ref.rollout.agentcore.agent_name=xxx \
actor_rollout_ref.rollout.agentcore.subnets='["subnet-xxx"]' \ # for training cluster VPC 
actor_rollout_ref.rollout.agentcore.security_groups='["sg-xxx","sg-xxx"]' \ # for training cluster VPC 
actor_rollout_ref.rollout.agentcore.container_uri=xxx.dkr.ecr.xxx.amazonaws.com/xxx:tag \
actor_rollout_ref.rollout.agentcore.role_arn=xxx \
actor_rollout_ref.rollout.agentcore.s3_bucket=xxx \

We will release concrete training examples for various agentic use cases soon!

Design & Code Changes

We implement the proposed rollout engine by adding a separate AgentCoreLoopManager in verl/experimental/agent_loop/agentcore_loop.py. Almost all code changes reside in this file.

AgentCoreLoopManager initializes the inference servers similar to AgentLoopManager and registers them to the SGLang Router.
AgentCoreLoopManager passes the SGLang router address and model name to AgentCore Runtime when the container is first deployed, so that the agent knows where to get model response.
When the rollout batch arrives, RequestDispatcher in AgentCoreLoopManager will submit all requests to AgentCore Runtime endpoint in an asynchronous manner.
Once all the requests have been submitted, RolloutBuffer will poll SQS for rollout completion messages and download rollouts from S3 once they are done. Saving the rollout to S3 and notifying SQS will be done on the agent app side from AgentCore. We will be open sourcing a wrapper for agent apps soon and demonstrate that developers won't have to worry about these services at all.
When all rollouts have been collected or a time limit has been exceeded, AgentCoreLoopManager will return the available rollouts and terminate all sessions. The current design follows the synchronous RL paradigm but we plan to extend to async RL in the near future as AgentCore Runtime is naturally compatible.

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

Co-authored-by: Youzhi Luo <yzluo@amazon.com> Co-authored-by: Danylo Vashchilenko <vdanylo@amazon.com>

CLAassistant · 2025-11-20T22:55:06Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This PR introduces a significant and well-designed feature to decouple the rollout engine using AWS Bedrock AgentCore. The architecture using S3 and SQS is robust, and the implementation is comprehensive, including extensive testing. My feedback focuses on improving robustness and maintainability. I've identified a couple of areas where the code could be made more resilient to external changes and another where a refactoring could simplify the main training loop's logic, especially for future extensions. Overall, this is a high-quality contribution.

gemini-code-assist · 2025-11-20T22:56:50Z

verl/experimental/agent_loop/agentcore_loop.py

+# seconds - AgentCore new session cold start time under 25 TPS for container deployment (2025-11)
+SESSION_START_TIME = 10


The constant SESSION_START_TIME is hardcoded. The comment suggests this value is environment-specific and crucial for performance tuning (as it determines max_inflight_requests in RequestDispatcher). Hardcoding such parameters reduces flexibility. It would be better to make this a configurable parameter within the agentcore section of the rollout configuration. This would involve adding it to AgentCoreConfig and the corresponding YAML files, then reading it from the config where it's used.

gemini-code-assist · 2025-11-20T22:56:51Z

verl/experimental/agent_loop/agentcore_loop.py

+        # When timed out, the response is an error string instead of the actual endpoint arn
+        if self.agent_arn not in endpoint_response:
+            raise TimeoutError(endpoint_response)


The check if self.agent_arn not in endpoint_response: to detect a timeout is brittle because it relies on the specific string content of the error message returned by wait_for_agent_endpoint_ready. If the error message format changes in a future version of the bedrock-agentcore-starter-toolkit, this check will fail. A more robust approach would be to check the type of the response. Based on the comment, a timeout returns a string, while success returns a dictionary-like object.

Suggested change

# When timed out, the response is an error string instead of the actual endpoint arn

if self.agent_arn not in endpoint_response:

raise TimeoutError(endpoint_response)

# When timed out, the response is an error string instead of the actual endpoint arn

if isinstance(endpoint_response, str):

raise TimeoutError(endpoint_response)

gemini-code-assist · 2025-11-20T22:56:51Z

verl/trainer/ppo/ray_trainer.py

+                        if self.async_rollout_mode:
                            gen_batch_output = self.async_rollout_manager.generate_sequences(gen_batch_output)
+                        elif self.agentcore_rollout_mode:
+                            gen_batch_output = self.agentcore_rollout_manager.generate_sequences(gen_batch_output)
+                        else:
+                            gen_batch_output = self.actor_rollout_wg.generate_sequences(gen_batch_output)


This if/elif/else block for handling different rollout modes (async_rollout_mode, agentcore_rollout_mode, etc.) is repeated in several places within fit() and _validate(). This pattern makes the code harder to read and maintain. Adding a new rollout mode would require modifying all these blocks.

Consider refactoring this logic using the Strategy design pattern. You could define a RolloutStrategy interface and create concrete implementations for each mode (AsyncRolloutStrategy, AgentCoreRolloutStrategy, SyncRolloutStrategy). The RayPPOTrainer would then hold a single strategy object and delegate the mode-specific operations to it, cleaning up the control flow in fit() and _validate().

lyzustc · 2025-11-21T00:15:59Z

tests/experimental/agentcore_loop/test_basic_agentcore_loop.py

+    # modularity & easier organization.
+    # Relevant configs can be passed in via command line args too. Using an env file here
+    # to avoid hardcoded values.
+    agentcore_envs = dotenv_values("agentcore.env")


Looks like we need to provide an example agentcore.env file with a publicly usable aws account here.

verl/trainer/ppo/ray_trainer.py

* implement reward and baseline computation for AgentCore mode in remax * fix indention error

Release AgentCore rollout engine

bc7e3b8

Co-authored-by: Youzhi Luo <yzluo@amazon.com> Co-authored-by: Danylo Vashchilenko <vdanylo@amazon.com>

luyuzhe111 requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners November 20, 2025 22:54

gemini-code-assist bot reviewed Nov 20, 2025

View reviewed changes

lyzustc reviewed Nov 21, 2025

View reviewed changes

luyuzhe111 and others added 2 commits November 20, 2025 18:09

Fix merge error

f8ea334

Fix ReMax Compatibility for AgentCore rollout

c9b9af1

* implement reward and baseline computation for AgentCore mode in remax * fix indention error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] A Fully decoupled and auto-scaled rollout engine using AWS Bedrock AgentCore Runtime #4216

[RFC] A Fully decoupled and auto-scaled rollout engine using AWS Bedrock AgentCore Runtime #4216

Uh oh!

luyuzhe111 commented Nov 20, 2025

Uh oh!

CLAassistant commented Nov 20, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Uh oh!

gemini-code-assist bot Nov 20, 2025

Uh oh!

gemini-code-assist bot Nov 20, 2025

Uh oh!

lyzustc Nov 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# seconds - AgentCore new session cold start time under 25 TPS for container deployment (2025-11)
		SESSION_START_TIME = 10

[RFC] A Fully decoupled and auto-scaled rollout engine using AWS Bedrock AgentCore Runtime #4216

Are you sure you want to change the base?

[RFC] A Fully decoupled and auto-scaled rollout engine using AWS Bedrock AgentCore Runtime #4216

Uh oh!

Conversation

luyuzhe111 commented Nov 20, 2025

What does this PR do?

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

lyzustc Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Nov 20, 2025 •

edited

Loading