GPU Inference Server (#1112)

* feat: local inference server * fix: source to use bash + vars * chore: isort and black * fix: make file + inference mode * chore: logging * refactor: remove old links * fix: add new env vars * feat: hf inference server * refactor: remove old links * test: batch and single response * chore: black + isort * separate gpu and cpu dockerfiles * moved gpu to separate dockerfile * Fixed test endpoints * Edits to API. server won't start due to failed instantiation error * Method signature * fix: gpu_infer * tests: fix tests --------- Co-authored-by: Andriy Mulyar <andriy.mulyar@gmail.com>
2025-10-26 11:04:39 +00:00 · 2023-07-21 14:13:29 -05:00
parent 58f0fcab57
commit 8aba2c9009
14 changed files with 271 additions and 112 deletions
--- a/gpt4all-api/gpt4all_api/app/api_v1/events.py
+++ b/gpt4all-api/gpt4all_api/app/api_v1/events.py
@@ -1,8 +1,10 @@
 import logging
+
+from api_v1.settings import settings
 from fastapi import HTTPException
 from fastapi.responses import JSONResponse
 from starlette.requests import Request
-from api_v1.settings import settings
+
 log = logging.getLogger(__name__)


@@ -19,8 +21,9 @@ async def on_startup(app):
    startup_msg = startup_msg_fmt.format(settings=settings)
    log.info(startup_msg)

+
 def startup_event_handler(app):
    async def start_app() -> None:
        await on_startup(app)

-    return start_app
+    return start_app