mirror of
https://github.com/hpcaitech/ColossalAI.git
synced 2025-08-11 04:41:54 +00:00
* [gptq] add gptq kernel (#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * [gptq] add gptq tensor parallel (#4538) * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check * [gptq] combine gptq and kv cache manager (#4706) * combine gptq and kv cache manager * add init bits * delete useless code * add model path * delete usless print and update test * delete usless import * move option gptq to shard config * change replace linear to shardformer * update bloom policy * delete useless code * fix import bug and delete uselss code * change colossalai/gptq to colossalai/quant/gptq * update import linear for tests * delete useless code and mv gptq_kernel to kernel directory * fix triton kernel * add triton import
34 lines
699 B
Plaintext
34 lines
699 B
Plaintext
// Adapted from turboderp exllama: https://github.com/turboderp/exllama
|
|
|
|
#ifndef _util_cuh
|
|
#define _util_cuh
|
|
|
|
#include <cuda_runtime.h>
|
|
#include <cuda_fp16.h>
|
|
#include <cstdint>
|
|
#include <cstdio>
|
|
|
|
#if defined(USE_ROCM)
|
|
#define cudaUnspecified hipErrorUnknown
|
|
#else
|
|
#define cudaUnspecified cudaErrorApiFailureBase
|
|
#endif
|
|
|
|
// React to failure on return code != cudaSuccess
|
|
|
|
#define _cuda_check(fn) \
|
|
do { \
|
|
{_cuda_err = fn;} \
|
|
if (_cuda_err != cudaSuccess) goto _cuda_fail; \
|
|
} while(false)
|
|
|
|
// React to failure on return code == 0
|
|
|
|
#define _alloc_check(fn) \
|
|
do { \
|
|
if (!(fn)) { _cuda_err = cudaUnspecified; goto _cuda_fail; } \
|
|
else _cuda_err = cudaSuccess; \
|
|
} while(false)
|
|
|
|
#endif
|