mirror of https://github.com/nomic-ai/gpt4all.git synced 2025-06-22 21:48:23 +00:00

History

Zach Nussbaum 4c1903736e chore: requirement		2023-06-29 03:29:02 +00:00
..
client.py	fix: current status	2023-06-29 03:18:59 +00:00
convert_to_triton.py	fix: current status	2023-06-29 03:18:59 +00:00
README.md	fix: current status	2023-06-29 03:18:59 +00:00
requirements.txt	chore: requirement	2023-06-29 03:29:02 +00:00
test_data.json	fix: current status	2023-06-29 03:18:59 +00:00
triton_config.pbtxt	fix: current status	2023-06-29 03:18:59 +00:00

README.md

To Run Inference Server

docker run --gpus=1 --rm --net=host -v ${PWD}/model_store:/model_store nvcr.io/nvidia/tritonserver:23.01-py3 tritonserver --model-repository=/model_store

python client.py --model=<model_name>

Dynamic Batching

Need to figure out how to do batching such that we can have dynamic batching We're getting 1.3 infer/sec which seems slow....

To test, perf_analyzer -m nomic-ai--gpt4all-j --input-data test_data.json --measurement-interval 25000 --request-rate-range=10 -b 8