Commit Graph

  • b1915d2889 Merge pull request #6391 from hpcaitech/grpo-zero-bubble-rebase main YeAnbang 2025-11-13 09:54:34 +08:00
  • eb158eb201 fix ci; remove test cases that failed on 3080 (those with tps), can pass locally grpo-zero-bubble-rebase YeAnbang 2025-11-12 18:35:34 +08:00
  • 7f91b7e6f5 fix ci; specify flash-attn version YeAnbang 2025-11-11 15:38:41 +08:00
  • 79fd50d289 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci-update-config pre-commit-ci[bot] 2025-11-10 17:42:23 +00:00
  • 8c18be0b1c [pre-commit.ci] pre-commit autoupdate pre-commit-ci[bot] 2025-11-10 17:41:10 +00:00
  • 1b65963c02 fix readme YeAnbang 2025-11-10 15:47:18 +08:00
  • 4c53210aaf Merge branch 'grpo-zero-bubble-rebase' of https://github.com/hpcaitech/ColossalAI into grpo-zero-bubble-rebase YeAnbang 2025-11-07 19:22:31 +08:00
  • 535eba85e2 update readme YeAnbang 2025-11-07 19:19:54 +08:00
  • 6f7e8595fc [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-11-07 08:18:23 +00:00
  • 40b6a914f3 all tests passed YeAnbang 2025-11-07 16:08:36 +08:00
  • c865de32a5 cherry pick zero bubble RL YeAnbang 2025-11-06 15:12:51 +08:00
  • 2336d7f6d6 fix racing condition YeAnbang 2025-07-21 17:21:07 +08:00
  • ddda79c36f add entropy YeAnbang 2025-07-16 16:44:23 +08:00
  • dba0c0c4ed fix code evaluation YeAnbang 2025-07-14 16:25:03 +08:00
  • b47b610d98 add code for zero-bubble implementation YeAnbang 2025-07-09 11:21:43 +08:00
  • e5fdefa6cf update B200 info/img/benchmark (#6385) Yanjia0 2025-09-26 14:54:08 +08:00
  • f10f707c58 [pre-commit.ci] auto fixes from pre-commit.com hooks grpo-agentic-dev pre-commit-ci[bot] 2025-09-23 02:49:42 +00:00
  • 8ca76fe935 fix vllm configuration and load balancing YeAnbang 2025-09-23 10:47:44 +08:00
  • c095ec35da tested anb fix style issue YeAnbang 2025-09-22 09:36:47 +08:00
  • 8745e8f4d1 test asyncllm producer and other settings YeAnbang 2025-09-19 17:34:47 +08:00
  • 2b46ab1401 simplify _run_agentic_pipeline; fix old_log_probs YeAnbang 2025-09-18 18:28:36 +08:00
  • d47c56356b fix rollout, action mask, attention mask bugs YeAnbang 2025-09-18 16:45:37 +08:00
  • b6391bd720 remove qwen agent producer YeAnbang 2025-09-16 16:29:13 +08:00
  • edcef9edaf add custom agentic producer YeAnbang 2025-09-16 16:23:46 +08:00
  • 62f82a75ae add langgraph agent, still buggy YeAnbang 2025-09-08 11:26:33 +08:00
  • f3155409b5 support agentic with asyncllm YeAnbang 2025-09-03 15:12:46 +08:00
  • ae51e5b244 support asyncllm YeAnbang 2025-08-18 17:40:55 +08:00
  • 083766d54c Add new implementations of RL algorithms (#6383) sglucas 2025-09-03 13:48:06 +08:00
  • 84723e8bed [feat][merge] Support one-behind to reduce bubble time. Add profiling code. (#6355) grpo-latest-ascend xysheng-colossal 2025-09-02 17:05:15 +08:00
  • 48a673dcb0 [Ring Attention] Add more detailed references (#6294) Wenxuan Tan 2025-08-26 21:51:16 +08:00
  • 4ac2227488 Merge pull request #6378 from hpcaitech/grpo-latest-rebase-fix-resume YeAnbang 2025-08-18 17:09:53 +08:00
  • b38248d35f Merge pull request #6376 from hpcaitech/grpo-latest-rebase-main Hanks 2025-08-15 17:24:47 +08:00
  • fe1f429574 Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main grpo-latest-rebase-main YeAnbang 2025-08-15 10:16:49 +08:00
  • 4152c0b30f fix dist log prob test YeAnbang 2025-08-15 10:11:54 +08:00
  • 73bdfd8891 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-08-14 11:05:40 +00:00
  • 99ba48fc40 Merge branch 'grpo-latest-rebase-main' of https://github.com/hpcaitech/ColossalAI into grpo-latest-rebase-main YeAnbang 2025-08-14 19:03:04 +08:00
  • 762150cf51 fix ci YeAnbang 2025-08-14 19:00:30 +08:00
  • bbc5fb4ed8 fix ci YeAnbang 2025-08-14 18:59:54 +08:00
  • 94e972fda6 Update timeout Hanks 2025-08-14 09:42:21 +08:00
  • c83dc66645 Update timeout Hanks 2025-08-14 09:39:49 +08:00
  • 9db9892f63 reduce memory consumption Hanks 2025-08-13 16:45:43 +08:00
  • b6a5f678cd reduce memory consumption Hanks 2025-08-13 16:37:49 +08:00
  • e589ec505e support resume training YeAnbang 2025-08-12 08:10:56 +00:00
  • 08a1244ef1 [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-08-06 06:16:37 +00:00
  • 32b2148670 tested after rebasing, fix importance sampling bug YeAnbang 2025-08-06 06:15:15 +00:00
  • 3746f73854 fix missing or wrong file during rebase YeAnbang 2025-08-05 14:41:12 +08:00
  • 118a66fd46 [Fix] Add L2 Regularization (#6372) YeAnbang 2025-07-29 16:56:52 +08:00
  • c7829769e9 hotfix entropy calculation (#6364) YeAnbang 2025-07-22 10:02:02 +08:00
  • 3d9dd34973 add entropy (#6363) YeAnbang 2025-07-17 15:05:10 +08:00
  • eafbc89b1b fix style YeAnbang 2025-07-14 18:23:39 +08:00
  • 352a8e0430 fix code evaluation YeAnbang 2025-07-14 16:25:03 +08:00
  • 594c2c6522 [feat[ Support one-behind to reduce bubble time. Add profiling code (#6353) YeAnbang 2025-06-30 13:21:08 +08:00
  • 685e0bd8da add dp rank for multi-dp (#6351) Tong Li 2025-06-19 14:02:08 +08:00
  • b314da19f4 fix small bug YeAnbang 2025-06-19 01:37:52 +00:00
  • 245c8c2fbc implement memory efficient logprob YeAnbang 2025-06-18 10:24:48 +00:00
  • a960990f1e optimize pp log_softmax OOM YeAnbang 2025-06-13 18:21:54 +08:00
  • 0f71c79760 fix num_update_per_episode YeAnbang 2025-06-12 15:06:01 +08:00
  • 73384bea19 Update README.md YeAnbang 2025-06-12 11:21:31 +08:00
  • 80c576f5ea add ray timeout handling instruction YeAnbang 2025-06-10 18:21:42 +08:00
  • 79a7b99fe6 update readme YeAnbang 2025-06-10 17:17:41 +08:00
  • 6a0b809fd1 modify readme YeAnbang 2025-06-10 17:00:35 +08:00
  • 3b3c48d9a8 Manually schedule resources and support auto master address assigning YeAnbang 2025-06-10 15:00:48 +08:00
  • 3a4681fdd9 fix pp memory issue (#6344) Tong Li 2025-06-11 17:54:18 +08:00
  • 6ae54a6dce move out evaluation func (#6343) Tong Li 2025-06-10 13:53:19 +08:00
  • 72b2d989df [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-06-09 01:48:19 +00:00
  • 9dbb0ff89f remove debug code YeAnbang 2025-06-09 09:42:58 +08:00
  • de40c736d0 fix bug, tested YeAnbang 2025-06-09 09:37:28 +08:00
  • 177144794b support code generation tasks YeAnbang 2025-06-05 17:56:42 +08:00
  • a9a3f374e5 fix typ and parameter description YeAnbang 2025-06-05 15:41:14 +08:00
  • 8d52441f6d [pre-commit.ci] auto fixes from pre-commit.com hooks pre-commit-ci[bot] 2025-05-29 10:16:55 +00:00
  • a246bf25c3 add overlength sample count (#6332) Tong Li 2025-05-28 19:18:09 +08:00
  • 60510010d1 address conversation YeAnbang 2025-05-29 10:25:59 +08:00
  • 382307a62c fix default eval setting (#6321) Tong Li 2025-05-22 11:52:41 +08:00
  • 2a39d3afd9 address conversation YeAnbang 2025-05-28 17:34:11 +08:00
  • 4b1c515f52 fix missing tags parameter YeAnbang 2025-05-21 10:51:32 +08:00
  • 5bbfe1567f fix empty tensor (#6319) Tong Li 2025-05-20 17:41:44 +08:00
  • 70c3daa4ee add uuid to rollout log YeAnbang 2025-05-20 09:45:56 +08:00
  • 06cfbe313b fix metric calculation YeAnbang 2025-05-20 18:14:05 +08:00
  • c7c73df60a fix logging rollouts YeAnbang 2025-05-17 21:12:58 +08:00
  • 9cbc5dd924 upgrade reward functions YeAnbang 2025-05-16 18:04:38 +08:00
  • 6095274be6 support logging rollouts to wandb YeAnbang 2025-05-16 15:56:03 +08:00
  • 654aefc3c3 address conversation YeAnbang 2025-05-16 14:15:35 +08:00
  • e7f61be51a fix evaluation YeAnbang 2025-05-16 09:42:35 +08:00
  • 6ebd813b5f handle empty index Tong Li 2025-05-15 18:30:27 +08:00
  • 88f49ddc5e remove redundant code and fix bugs YeAnbang 2025-05-16 14:08:23 +08:00
  • d19f1f21b6 move prompt-level-filtering to buffer side YeAnbang 2025-05-15 18:30:32 +08:00
  • f79dbdb2df move prompt-level-filtering to buffer side YeAnbang 2025-05-15 18:16:50 +08:00
  • 0d0fef771f disable wandb tb syncing YeAnbang 2025-05-15 16:52:31 +08:00
  • 280aa0b830 use consumer global step YeAnbang 2025-05-15 14:15:40 +08:00
  • 5a6e4a6d75 [feat] Support prompt level dynamic (#6300) Tong Li 2025-05-14 16:40:35 +08:00
  • 3416a4fc9c move logging to producer YeAnbang 2025-05-14 18:10:57 +08:00
  • af4366f0cb Support evaluation during training YeAnbang 2025-04-30 18:13:40 +08:00
  • 4ac7d065a6 update pad seq (#6303) Tong Li 2025-05-13 16:51:27 +08:00
  • 9544c51a74 [fix] revert reward update and evaluation (#6295) YeAnbang 2025-05-07 10:56:47 +08:00
  • 06b892bf4d rewrite reward fn YeAnbang 2025-05-01 11:28:05 +08:00
  • 9642b75581 upgrade reward math verification YeAnbang 2025-04-30 22:59:54 +08:00
  • 1be993de3e fix bug YeAnbang 2025-04-30 22:53:12 +08:00
  • de0c267f5a reuse comm-group YeAnbang 2025-04-30 21:36:11 +08:00
  • 16600f3509 Support evaluation during training YeAnbang 2025-04-30 18:13:40 +08:00
  • 6a1bd833e0 [feat] Sync shard model (#6289) Tong Li 2025-04-30 14:47:01 +08:00