Commit Graph

18 Commits

Author SHA1 Message Date
Jiarui Fang
7675366fce [polish] rename col_attr -> colo_attr (#558) 2022-03-31 12:25:45 +08:00
Liang Bowen
2c45efc398 html refactor (#555) 2022-03-31 11:36:56 +08:00
Jiarui Fang
107b99ddb1 [zero] dump memory stats for sharded model (#548) 2022-03-30 09:38:44 +08:00
Jiarui Fang
53b1b6e340 [zero] non model data tracing (#545) 2022-03-29 15:45:48 +08:00
Jie Zhu
73d36618a6 [profiler] add MemProfiler (#356)
* add memory trainer hook

* fix bug

* add memory trainer hook

* fix import bug

* fix import bug

* add trainer hook

* fix #370 git log bug

* modify `to_tensorboard` function to support better output

* remove useless output

* change the name of `MemProfiler`

* complete memory profiler

* replace error with warning

* finish trainer hook

* modify interface of MemProfiler

* modify `__init__.py` in profiler

* remove unnecessary pass statement

* add usage to doc string

* add usage to trainer hook

* new location to store temp data file
2022-03-29 12:48:34 +08:00
Jiarui Fang
c11ff81b15 [zero] get memory usage of sharded optim v2. (#542) 2022-03-29 09:08:18 +08:00
Jiarui Fang
705f56107c [zero] refactor model data tracing (#537) 2022-03-28 16:38:18 +08:00
Jiarui Fang
8d8c5407c0 [zero] refactor model data tracing (#522) 2022-03-25 18:03:32 +08:00
Jiarui Fang
0bebda6ea5 [zero] fix init device bug in zero init context unittest (#516) 2022-03-25 12:24:18 +08:00
Jiarui Fang
7ef3507ace [zero] show model data cuda memory usage after zero context init. (#515) 2022-03-25 11:23:35 +08:00
Jiarui Fang
9330be0f3c [memory] set cuda mem frac (#506) 2022-03-24 16:57:13 +08:00
Jiarui Fang
0035b7be07 [memory] add model data tensor moving api (#503) 2022-03-24 14:29:41 +08:00
Jiarui Fang
a445e118cf [polish] polish singleton and global context (#500) 2022-03-23 18:03:39 +08:00
Frank Lee
b03b3ae99c fixed mem monitor device (#433)
fixed mem monitor device
2022-03-16 15:25:02 +08:00
Jiarui Fang
56bb412e72 [polish] use GLOBAL_MODEL_DATA_TRACER (#417) 2022-03-15 11:29:46 +08:00
Jiarui Fang
21dc54e019 [zero] memtracer to record cuda memory usage of model data and overall system (#395) 2022-03-14 22:05:30 +08:00
Jiarui Fang
ea2872073f [zero] global model data memory tracer (#360) 2022-03-11 15:50:28 +08:00
Jiarui Fang
10e2826426 move async memory to an individual directory (#345) 2022-03-11 15:50:28 +08:00