update readme

2026-05-04 09:52:50 +00:00 · 2025-11-07 19:19:54 +08:00
parent 40b6a914f3
commit 535eba85e2
2 changed files with 74 additions and 0 deletions
--- a/applications/ColossalChat/coati/distributed/README.md
+++ b/applications/ColossalChat/coati/distributed/README.md
@@ -14,6 +14,7 @@ This repository implements a distributed Reinforcement Learning (RL) training fr
 * **Rollout and Policy Decoupling**: Efficient generation and consumption of data through parallel inferencer-trainer architecture.
 * **Evaluation Integration**: Easily plug in task-specific eval datasets.
 * **Checkpoints and Logging**: Configurable intervals and directories.
+* **[New]**: Zero Bubble training framework that supports GRPO and DAPO. [(read more)](./zero_bubble/README.md)

 ---

--- a/applications/ColossalChat/coati/distributed/zero_bubble/README.md
+++ b/applications/ColossalChat/coati/distributed/zero_bubble/README.md
@@ -0,0 +1,73 @@
+# Zero Bubble Distributed RL Framework for Language Model Fine-Tuning
+
+This folder contains code for the Zero Bubble distributed RL framework. It currently supports **GRPO** and **DAPO**. See the [main README](../README.md) for general installation instructions and usage.
+
+**Note:** This project is under active development — expect changes.
+
+## 🛠 Installation
+
+1. Follow the general installation guide in the [main README](../README.md).
+2. Install [pygloo](https://github.com/ray-project/pygloo). Build pygloo for Ray from source following the instructions in its repository README.
+
+## Design idea
+
+We aim to reduce the *“bubble”* — the idle time that occurs between rollouts and training steps (illustrated in Fig. 1).
+
+<div align="center">
+  <p align="center">
+    <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/all_sync.png" width=700/>
+  </p>
+</div>
+
+**Fig. 1** - In an all-sync online RL framework, rollout workers wait for the trainer to finish training and synchronize weights, and the trainer waits for rollouts. This causes large GPU idle time.
+
+<div align="center">
+  <p align="center">
+    <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/zero_bubble.png" width=700/>
+  </p>
+</div>
+
+**Fig. 2** - Our Zero Bubble pipeline follows a producer–consumer pattern:
+
+* A global **data buffer** temporarily stores rollouts produced by inference workers.
+* A **weights distributor** buffers updated model weights and distributes them to inference workers.
+* When the data buffer has enough data, the trainer continuously consumes from it and pushes updated weights to the weights distributor.
+* After finishing a mini-batch, each inference worker checks the weights distributor and synchronizes to a newer weight version if available.
+
+Under ideal conditions (inference workers produce data at the same rate the trainer consumes it), the pipeline eliminates idle time. We call it *zero bubble* because, with an unlimited data buffer, inference and training can run indefinitely without waiting. In practice, to avoid wasted compute and stale/off-policy data, we set a bounded buffer size so inference workers will briefly wait when the buffer is full.
+
+## Usage
+
+In addition to the general parameters (see the main README), the Zero Bubble pipeline introduces one additional parameter:
+
+* **`data_actor_buffer_size_limit`** - Maximum number of rollout batches the data buffer may hold. Defaults to **twice** the trainer’s mini-batch size. Avoid setting this too large — a very large buffer increases off-policy training. For DAPO, since only effective prompts count, you may need to raise `data_actor_buffer_size_limit` depending on sample utility.
+
+Example: RL training on 8 GPUs with Zero Bubble (zero2)
+
+```bash
+python rl_example_zero_bubble.py \
+  --dataset /path/to/your/dataset.jsonl \
+  --model /path/to/your/model \
+  -t 4 -i 4 -b vllm -a DAPO \
+  -imbs 8 -ibs 8 -tbs 8 -e 2 -rt boxed \
+  -si 25 -s "Please reason step by step, and put your final answer within \\boxed{}." \
+  -tMbs 2 -tmbs 2 -p Rebase_Experiments -zero 2 -mpt 512 -mnt 3584
+```
+
+## Performance
+
+<div align="center">
+  <p align="center">
+    <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/zero_bubble_gpu_util.png" width=700/>
+  </p>
+</div>
+
+**Fig. 3** - Performance of the Zero Bubble pipeline tested with an unlimited buffer size.
+
+---
+
+If you'd like, I can:
+
+* Produce a short "What changed" summary for the repo (listing grammar/clarity edits).
+* Create a compact one-paragraph summary for the project page.
+* Convert this into a prettier doc with badges, table of contents, or a changelog. Which would you prefer?