Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity

This commit is contained in:
imartinez 2024-02-29 16:40:11 +01:00
parent 85276893a3
commit c3fe36e070
21 changed files with 186 additions and 106 deletions

View File

@ -25,6 +25,6 @@ runs:
python-version: ${{ inputs.python_version }} python-version: ${{ inputs.python_version }}
cache: "poetry" cache: "poetry"
- name: Install Dependencies - name: Install Dependencies
run: poetry install --extras "ui qdrant" --no-root run: poetry install --extras "ui vector-stores-qdrant" --no-root
shell: bash shell: bash

View File

@ -14,7 +14,7 @@ FROM base as dependencies
WORKDIR /home/worker/app WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./ COPY pyproject.toml poetry.lock ./
RUN poetry install --extras "ui qdrant" RUN poetry install --extras "ui vector-stores-qdrant"
FROM base as app FROM base as app

View File

@ -24,7 +24,7 @@ FROM base as dependencies
WORKDIR /home/worker/app WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./ COPY pyproject.toml poetry.lock ./
RUN poetry install --extras "ui local qdrant" RUN poetry install --extras "ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
FROM base as app FROM base as app

View File

@ -17,7 +17,10 @@ There is an extra component that can be enabled or disabled: the UI. It is a Gra
### Setups and Dependencies ### Setups and Dependencies
Your setup will be the combination of the different options available. You'll find recommended setups in the [installation](/installation) section. Your setup will be the combination of the different options available. You'll find recommended setups in the [installation](/installation) section.
PrivateGPT uses poetry to manage its dependencies. You can install the dependencies for the different setups by running `poetry install --extras "<extra1> <extra2>..."`. PrivateGPT uses poetry to manage its dependencies. You can install the dependencies for the different setups by running `poetry install --extras "<extra1> <extra2>..."`.
Extras are the different options available for each component. For example, to install the dependencies for a local setup with UI and qdrant as vector database, you would run `poetry install --extras "ui local qdrant"`. Extras are the different options available for each component. For example, to install the dependencies for a a local setup with UI and qdrant as vector database, Ollama as LLM and HuggingFace as local embeddings, you would run
`poetry install --extras "ui vector-stores-qdrant llms-ollama embeddings-huggingface"`.
Refer to the [installation](/installation) section for more details. Refer to the [installation](/installation) section for more details.
### Setups and Configuration ### Setups and Configuration
@ -35,9 +38,9 @@ will load the configuration from `settings.yaml` and `settings-ollama.yaml`.
## About Fully Local Setups ## About Fully Local Setups
In order to run PrivateGPT in a fully local setup, you will need to run the LLM, Embeddings and Vector Store locally. In order to run PrivateGPT in a fully local setup, you will need to run the LLM, Embeddings and Vector Store locally.
### Vector stores ### Vector stores
The 3 vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default. The vector stores supported (Qdrant, ChromaDB and Postgres) run locally by default.
### Embeddings ### Embeddings
For local embeddings you need to install the 'local' extra dependencies. It will use Huggingface Embeddings. For local embeddings you need to install the 'embeddings-huggingface' extra dependencies. It will use Huggingface Embeddings.
Note: Ollama will support Embeddings in the short term for easier installation, but it doesn't as of today. Note: Ollama will support Embeddings in the short term for easier installation, but it doesn't as of today.
@ -48,7 +51,7 @@ poetry run python scripts/setup
### LLM ### LLM
For local LLM there are two options: For local LLM there are two options:
* (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs. * (Recommended) You can use the 'ollama' option in PrivateGPT, which will connect to your local Ollama instance. Ollama simplifies a lot the installation of local LLMs.
* You can use the 'local' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting. * You can use the 'llms-llama-cpp' option in PrivateGPT, which will use LlamaCPP. It works great on Mac with Metal most of the times (leverages Metal GPU), but it can be tricky in certain Linux and Windows distributions, depending on the GPU. In the installation document you'll find guides and troubleshooting.
In order for local LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script: In order for local LLM to work (the second option), you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
```bash ```bash

View File

@ -30,8 +30,8 @@ pyenv local 3.11
PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use. PrivateGPT allows to customize the setup -from fully local to cloud based- by deciding the modules to use.
Here are the different options available: Here are the different options available:
- LLM: "local" (uses LlamaCPP), "ollama", "sagemaker", "openai", "openailike" - LLM: "llama-cpp", "ollama", "sagemaker", "openai", "openailike"
- Embeddings: "local" (uses HuggingFace embeddings), "openai", "sagemaker" - Embeddings: "huggingface", "openai", "sagemaker"
- Vector stores: "qdrant", "chroma", "postgres" - Vector stores: "qdrant", "chroma", "postgres"
- UI: whether or not to enable UI (Gradio) or just go with the API - UI: whether or not to enable UI (Gradio) or just go with the API
@ -44,14 +44,17 @@ poetry install --extras "<extra1> <extra2>..."
Where `<extra>` can be any of the following: Where `<extra>` can be any of the following:
- ui: adds support for UI using Gradio - ui: adds support for UI using Gradio
- local: adds support for local LLM and Embeddings using LlamaCPP - expect a messy installation process on some platforms - llms-ollama: adds support for Ollama LLM, the easiest way to get a local LLM running
- openai: adds support for OpenAI LLM and Embeddings, requires OpenAI API key - llms-llama-cpp: adds support for local LLM using LlamaCPP - expect a messy installation process on some platforms
- sagemaker: adds support for Amazon Sagemaker LLM and Embeddings, requires Sagemaker endpoints - llms-sagemaker: adds support for Amazon Sagemaker LLM, requires Sagemaker inference endpoints
- ollama: adds support for Ollama LLM, the easiest way to get a local LLM running - llms-openai: adds support for OpenAI LLM, requires OpenAI API key
- openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API - llms-openai-like: adds support for 3rd party LLM providers that are compatible with OpenAI's API
- qdrant: adds support for Qdrant vector store - embeddings-huggingface: adds support for local Embeddings using HuggingFace
- chroma: adds support for Chroma DB vector store - embeddings-sagemaker: adds support for Amazon Sagemaker Embeddings, requires Sagemaker inference endpoints
- postgres: adds support for Postgres vector store - embeddings-openai = adds support for OpenAI Embeddings, requires OpenAI API key
- vector-stores-qdrant: adds support for Qdrant vector store
- vector-stores-chroma: adds support for Chroma DB vector store
- vector-stores-postgres: adds support for Postgres vector store
## Recommended Setups ## Recommended Setups
@ -66,10 +69,10 @@ Go to [ollama.ai](https://ollama.ai/) and follow the instructions to install Oll
Once done, you can install PrivateGPT with the following command: Once done, you can install PrivateGPT with the following command:
```bash ```bash
poetry install --extras "ui local ollama qdrant" poetry install --extras "ui llms-ollama embeddings-huggingface vector-stores-qdrant"
``` ```
We are installing "local" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it! We are installing "embeddings-huggingface" dependency to support local embeddings, because Ollama doesn't support embeddings just yet. But they working on it!
In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script: In order for local embeddings to work, you need to download the embeddings model to the `models` folder. You can do so by running the `setup` script:
```bash ```bash
poetry run python scripts/setup poetry run python scripts/setup
@ -95,7 +98,7 @@ Edit the `settings-sagemaker.yaml` file to include the correct Sagemaker endpoin
Then, install PrivateGPT with the following command: Then, install PrivateGPT with the following command:
```bash ```bash
poetry install --extras "ui sagemaker qdrant" poetry install --extras "ui llms-sagemaker embeddings-sagemaker vector-stores-qdrant"
``` ```
Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command. Once installed, you can run PrivateGPT. Make sure you have a working Ollama running locally before running the following command.
@ -113,7 +116,7 @@ The UI will be available at http://localhost:8001
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command: If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
```bash ```bash
poetry install --extras "ui local qdrant" poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
``` ```
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script: In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
@ -127,7 +130,53 @@ Once installed, you can run PrivateGPT with the following command:
PGPT_PROFILES=local make run PGPT_PROFILES=local make run
``` ```
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP and Qdrant. PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
The UI will be available at http://localhost:8001
### Non-Private, OpenAI-powered test setup
If you want to test PrivateGPT with OpenAI's LLM and Embeddings -taking into account your data is going to OpenAI!- you can run the following command:
You need an OPENAI API key to run this setup.
Edit the `settings-openai.yaml` file to include the correct API KEY. Never commit it! It's a secret! As an alternative to editing `settings-openai.yaml`, you can just set the env var OPENAI_API_KEY.
Then, install PrivateGPT with the following command:
```bash
poetry install --extras "ui llms-openai embeddings-openai vector-stores-qdrant"
```
Once installed, you can run PrivateGPT.
```bash
PGPT_PROFILES=openai make run
```
PrivateGPT will use the already existing `settings-openai.yaml` settings file, which is already configured to use OpenAI LLM and Embeddings endpoints, and Qdrant.
The UI will be available at http://localhost:8001
### Local, Llama-CPP powered setup
If you want to run PrivateGPT fully locally without relying on Ollama, you can run the following command:
```bash
poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant"
```
In order for local LLM and embeddings to work, you need to download the models to the `models` folder. You can do so by running the `setup` script:
```bash
poetry run python scripts/setup
```
Once installed, you can run PrivateGPT with the following command:
```bash
PGPT_PROFILES=local make run
```
PrivateGPT will load the already existing `settings-local.yaml` file, which is already configured to use LlamaCPP LLM, HuggingFace embeddings and Qdrant.
The UI will be available at http://localhost:8001 The UI will be available at http://localhost:8001

21
poetry.lock generated
View File

@ -5895,17 +5895,20 @@ docs = ["furo", "jaraco.packaging (>=9.3)", "jaraco.tidelift (>=1.4)", "rst.link
testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=2.2)", "pytest-ignore-flaky", "pytest-mypy (>=0.9.1)", "pytest-ruff"] testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", "pytest-cov", "pytest-enabler (>=2.2)", "pytest-ignore-flaky", "pytest-mypy (>=0.9.1)", "pytest-ruff"]
[extras] [extras]
chroma = ["llama-index-vector-stores-chroma"] embeddings-huggingface = ["llama-index-embeddings-huggingface"]
local = ["llama-index-embeddings-huggingface", "llama-index-llms-llama-cpp"] embeddings-openai = ["llama-index-embeddings-openai"]
ollama = ["llama-index-llms-ollama"] embeddings-sagemaker = ["boto3"]
openai = ["llama-index-embeddings-openai", "llama-index-llms-openai"] llms-llama-cpp = ["llama-index-llms-llama-cpp"]
openai-like = ["llama-index-llms-openai-like"] llms-ollama = ["llama-index-llms-ollama"]
postgres = ["llama-index-vector-stores-postgres"] llms-openai = ["llama-index-llms-openai"]
qdrant = ["llama-index-vector-stores-qdrant"] llms-openai-like = ["llama-index-llms-openai-like"]
sagemaker = ["boto3"] llms-sagemaker = ["boto3"]
ui = ["gradio"] ui = ["gradio"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
vector-stores-postgres = ["llama-index-vector-stores-postgres"]
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
[metadata] [metadata]
lock-version = "2.0" lock-version = "2.0"
python-versions = ">=3.11,<3.12" python-versions = ">=3.11,<3.12"
content-hash = "c2e3a4c948a9a49cc11ea085a20dc82b73dc0516fefa11ee941389b6f1ca1f3e" content-hash = "0249c25c783180d0c483c533d9102e3885e4a4f5261dc331a41323bd79d446f3"

View File

@ -18,18 +18,18 @@ class EmbeddingComponent:
embedding_mode = settings.embedding.mode embedding_mode = settings.embedding.mode
logger.info("Initializing the embedding model in mode=%s", embedding_mode) logger.info("Initializing the embedding model in mode=%s", embedding_mode)
match embedding_mode: match embedding_mode:
case "local": case "huggingface":
try: try:
from llama_index.embeddings.huggingface import ( # type: ignore from llama_index.embeddings.huggingface import ( # type: ignore
HuggingFaceEmbedding, HuggingFaceEmbedding,
) )
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"Local dependencies not found, install with `poetry install --extras local`" "Local dependencies not found, install with `poetry install --extras embeddings-huggingface`"
) from e ) from e
self.embedding_model = HuggingFaceEmbedding( self.embedding_model = HuggingFaceEmbedding(
model_name=settings.local.embedding_hf_model_name, model_name=settings.huggingface.embedding_hf_model_name,
cache_folder=str(models_cache_path), cache_folder=str(models_cache_path),
) )
case "sagemaker": case "sagemaker":
@ -39,7 +39,7 @@ class EmbeddingComponent:
) )
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"Sagemaker dependencies not found, install with `poetry install --extras sagemaker`" "Sagemaker dependencies not found, install with `poetry install --extras embeddings-sagemaker`"
) from e ) from e
self.embedding_model = SagemakerEmbedding( self.embedding_model = SagemakerEmbedding(
@ -52,7 +52,7 @@ class EmbeddingComponent:
) )
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"OpenAI dependencies not found, install with `poetry install --extras openai`" "OpenAI dependencies not found, install with `poetry install --extras embeddings-openai`"
) from e ) from e
openai_settings = settings.openai.api_key openai_settings = settings.openai.api_key

View File

@ -7,26 +7,20 @@ import logging
from typing import TYPE_CHECKING, Any from typing import TYPE_CHECKING, Any
import boto3 # type: ignore import boto3 # type: ignore
from llama_index.bridge.pydantic import Field from llama_index.core.base.llms.generic_utils import (
from llama_index.llms import ( completion_response_to_chat_response,
stream_completion_response_to_chat_response,
)
from llama_index.core.bridge.pydantic import Field
from llama_index.core.llms import (
CompletionResponse, CompletionResponse,
CustomLLM, CustomLLM,
LLMMetadata, LLMMetadata,
) )
from llama_index.llms.base import ( from llama_index.core.llms.callbacks import (
llm_chat_callback, llm_chat_callback,
llm_completion_callback, llm_completion_callback,
) )
from llama_index.llms.generic_utils import (
completion_response_to_chat_response,
stream_completion_response_to_chat_response,
)
from llama_index.llms.llama_utils import (
completion_to_prompt as generic_completion_to_prompt,
)
from llama_index.llms.llama_utils import (
messages_to_prompt as generic_messages_to_prompt,
)
if TYPE_CHECKING: if TYPE_CHECKING:
from collections.abc import Sequence from collections.abc import Sequence
@ -161,8 +155,8 @@ class SagemakerLLM(CustomLLM):
model_kwargs = model_kwargs or {} model_kwargs = model_kwargs or {}
model_kwargs.update({"n_ctx": context_window, "verbose": verbose}) model_kwargs.update({"n_ctx": context_window, "verbose": verbose})
messages_to_prompt = messages_to_prompt or generic_messages_to_prompt messages_to_prompt = messages_to_prompt or {}
completion_to_prompt = completion_to_prompt or generic_completion_to_prompt completion_to_prompt = completion_to_prompt or {}
generate_kwargs = generate_kwargs or {} generate_kwargs = generate_kwargs or {}
generate_kwargs.update( generate_kwargs.update(

View File

@ -30,18 +30,18 @@ class LLMComponent:
logger.info("Initializing the LLM in mode=%s", llm_mode) logger.info("Initializing the LLM in mode=%s", llm_mode)
match settings.llm.mode: match settings.llm.mode:
case "local": case "llamacpp":
try: try:
from llama_index.llms.llama_cpp import LlamaCPP # type: ignore from llama_index.llms.llama_cpp import LlamaCPP # type: ignore
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"Local dependencies not found, install with `poetry install --extras local`" "Local dependencies not found, install with `poetry install --extras llms-llama-cpp`"
) from e ) from e
prompt_style = get_prompt_style(settings.local.prompt_style) prompt_style = get_prompt_style(settings.llamacpp.prompt_style)
self.llm = LlamaCPP( self.llm = LlamaCPP(
model_path=str(models_path / settings.local.llm_hf_model_file), model_path=str(models_path / settings.llamacpp.llm_hf_model_file),
temperature=0.1, temperature=0.1,
max_new_tokens=settings.llm.max_new_tokens, max_new_tokens=settings.llm.max_new_tokens,
context_window=settings.llm.context_window, context_window=settings.llm.context_window,
@ -60,7 +60,7 @@ class LLMComponent:
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"Sagemaker dependencies not found, install with `poetry install --extras sagemaker`" "Sagemaker dependencies not found, install with `poetry install --extras llms-sagemaker`"
) from e ) from e
self.llm = SagemakerLLM( self.llm = SagemakerLLM(
@ -73,7 +73,7 @@ class LLMComponent:
from llama_index.llms.openai import OpenAI # type: ignore from llama_index.llms.openai import OpenAI # type: ignore
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"OpenAI dependencies not found, install with `poetry install --extras openai`" "OpenAI dependencies not found, install with `poetry install --extras llms-openai`"
) from e ) from e
openai_settings = settings.openai openai_settings = settings.openai
@ -87,7 +87,7 @@ class LLMComponent:
from llama_index.llms.openai_like import OpenAILike # type: ignore from llama_index.llms.openai_like import OpenAILike # type: ignore
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"OpenAILike dependencies not found, install with `poetry install --extras openailike`" "OpenAILike dependencies not found, install with `poetry install --extras llms-openailike`"
) from e ) from e
openai_settings = settings.openai openai_settings = settings.openai
@ -104,7 +104,7 @@ class LLMComponent:
from llama_index.llms.ollama import Ollama # type: ignore from llama_index.llms.ollama import Ollama # type: ignore
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"Ollama dependencies not found, install with `poetry install --extras ollama`" "Ollama dependencies not found, install with `poetry install --extras llms-ollama`"
) from e ) from e
ollama_settings = settings.ollama ollama_settings = settings.ollama

View File

@ -45,7 +45,7 @@ class VectorStoreComponent:
) )
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"Postgres dependencies not found, install with `poetry install --extras postgres`" "Postgres dependencies not found, install with `poetry install --extras vector-stores-postgres`"
) from e ) from e
if settings.pgvector is None: if settings.pgvector is None:
@ -72,7 +72,7 @@ class VectorStoreComponent:
) )
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"ChromaDB dependencies not found, install with `poetry install --extras chroma`" "ChromaDB dependencies not found, install with `poetry install --extras vector-stores-chroma`"
) from e ) from e
chroma_settings = ChromaSettings(anonymized_telemetry=False) chroma_settings = ChromaSettings(anonymized_telemetry=False)
@ -99,7 +99,7 @@ class VectorStoreComponent:
from qdrant_client import QdrantClient from qdrant_client import QdrantClient
except ImportError as e: except ImportError as e:
raise ImportError( raise ImportError(
"Qdrant dependencies not found, install with `poetry install --extras qdrant`" "Qdrant dependencies not found, install with `poetry install --extras vector-stores-qdrant`"
) from e ) from e
if settings.qdrant is None: if settings.qdrant is None:

View File

@ -81,7 +81,7 @@ class DataSettings(BaseModel):
class LLMSettings(BaseModel): class LLMSettings(BaseModel):
mode: Literal["local", "openai", "openailike", "sagemaker", "mock", "ollama"] mode: Literal["llamacpp", "openai", "openailike", "sagemaker", "mock", "ollama"]
max_new_tokens: int = Field( max_new_tokens: int = Field(
256, 256,
description="The maximum number of token that the LLM is authorized to generate in one completion.", description="The maximum number of token that the LLM is authorized to generate in one completion.",
@ -104,12 +104,9 @@ class VectorstoreSettings(BaseModel):
database: Literal["chroma", "qdrant", "pgvector"] database: Literal["chroma", "qdrant", "pgvector"]
class LocalSettings(BaseModel): class LlamaCPPSettings(BaseModel):
llm_hf_repo_id: str llm_hf_repo_id: str
llm_hf_model_file: str llm_hf_model_file: str
embedding_hf_model_name: str = Field(
description="Name of the HuggingFace model to use for embeddings"
)
prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] = Field( prompt_style: Literal["default", "llama2", "tag", "mistral", "chatml"] = Field(
"llama2", "llama2",
description=( description=(
@ -123,8 +120,14 @@ class LocalSettings(BaseModel):
) )
class HuggingFaceSettings(BaseModel):
embedding_hf_model_name: str = Field(
description="Name of the HuggingFace model to use for embeddings"
)
class EmbeddingSettings(BaseModel): class EmbeddingSettings(BaseModel):
mode: Literal["local", "openai", "sagemaker", "mock"] mode: Literal["huggingface", "openai", "sagemaker", "mock"]
ingest_mode: Literal["simple", "batch", "parallel"] = Field( ingest_mode: Literal["simple", "batch", "parallel"] = Field(
"simple", "simple",
description=( description=(
@ -292,7 +295,8 @@ class Settings(BaseModel):
ui: UISettings ui: UISettings
llm: LLMSettings llm: LLMSettings
embedding: EmbeddingSettings embedding: EmbeddingSettings
local: LocalSettings llamacpp: LlamaCPPSettings
huggingface: HuggingFaceSettings
sagemaker: SagemakerSettings sagemaker: SagemakerSettings
openai: OpenAISettings openai: OpenAISettings
ollama: OllamaSettings ollama: OllamaSettings

View File

@ -33,14 +33,17 @@ gradio = {version ="^4.19.2", optional = true}
[tool.poetry.extras] [tool.poetry.extras]
ui = ["gradio"] ui = ["gradio"]
local = ["llama-index-llms-llama-cpp", "llama-index-embeddings-huggingface"] llms-llama-cpp = ["llama-index-llms-llama-cpp"]
openai = ["llama-index-llms-openai", "llama-index-embeddings-openai"] llms-openai = ["llama-index-llms-openai"]
openai-like = ["llama-index-llms-openai-like"] llms-openai-like = ["llama-index-llms-openai-like"]
ollama = ["llama-index-llms-ollama"] llms-ollama = ["llama-index-llms-ollama"]
sagemaker = ["boto3"] llms-sagemaker = ["boto3"]
qdrant = ["llama-index-vector-stores-qdrant"] embeddings-huggingface = ["llama-index-embeddings-huggingface"]
chroma = ["llama-index-vector-stores-chroma"] embeddings-openai = ["llama-index-embeddings-openai"]
postgres = ["llama-index-vector-stores-postgres"] embeddings-sagemaker = ["boto3"]
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
vector-stores-postgres = ["llama-index-vector-stores-postgres"]
[tool.poetry.group.dev.dependencies] [tool.poetry.group.dev.dependencies]

View File

@ -8,9 +8,11 @@ llm:
embedding: embedding:
mode: ${PGPT_MODE:sagemaker} mode: ${PGPT_MODE:sagemaker}
local: llamacpp:
llm_hf_repo_id: ${PGPT_HF_REPO_ID:TheBloke/Mistral-7B-Instruct-v0.1-GGUF} llm_hf_repo_id: ${PGPT_HF_REPO_ID:TheBloke/Mistral-7B-Instruct-v0.1-GGUF}
llm_hf_model_file: ${PGPT_HF_MODEL_FILE:mistral-7b-instruct-v0.1.Q4_K_M.gguf} llm_hf_model_file: ${PGPT_HF_MODEL_FILE:mistral-7b-instruct-v0.1.Q4_K_M.gguf}
huggingface:
embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:BAAI/bge-small-en-v1.5} embedding_hf_model_name: ${PGPT_EMBEDDING_HF_MODEL_NAME:BAAI/bge-small-en-v1.5}
sagemaker: sagemaker:

View File

@ -2,23 +2,25 @@ server:
env_name: ${APP_ENV:local} env_name: ${APP_ENV:local}
llm: llm:
mode: local mode: llamacpp
# Should be matching the selected model # Should be matching the selected model
max_new_tokens: 512 max_new_tokens: 512
context_window: 3900 context_window: 3900
tokenizer: mistralai/Mistral-7B-Instruct-v0.2 tokenizer: mistralai/Mistral-7B-Instruct-v0.2
llamacpp:
prompt_style: "mistral"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding: embedding:
mode: local mode: huggingface
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
vectorstore: vectorstore:
database: qdrant database: qdrant
qdrant: qdrant:
path: local_data/private_gpt/qdrant path: local_data/private_gpt/qdrant
local:
prompt_style: "mistral"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5

View File

@ -4,5 +4,6 @@ server:
# This configuration allows you to use GPU for creating embeddings while avoiding loading LLM into vRAM # This configuration allows you to use GPU for creating embeddings while avoiding loading LLM into vRAM
llm: llm:
mode: mock mode: mock
embedding: embedding:
mode: local mode: huggingface

View File

@ -11,7 +11,10 @@ ollama:
api_base: http://localhost:11434 api_base: http://localhost:11434
embedding: embedding:
mode: local mode: huggingface
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
vectorstore: vectorstore:
database: qdrant database: qdrant
@ -19,6 +22,3 @@ vectorstore:
qdrant: qdrant:
path: local_data/private_gpt/qdrant path: local_data/private_gpt/qdrant
local:
prompt_style: "llama2"
embedding_hf_model_name: BAAI/bge-small-en-v1.5

12
settings-openai.yaml Normal file
View File

@ -0,0 +1,12 @@
server:
env_name: ${APP_ENV:openai}
llm:
mode: openai
embedding:
mode: openai
openai:
api_key: ${OPENAI_API_KEY:}
model: gpt-3.5-turbo

View File

@ -1,5 +1,5 @@
server: server:
env_name: ${APP_ENV:prod} env_name: ${APP_ENV:sagemaker}
port: ${PORT:8001} port: ${PORT:8001}
ui: ui:
@ -13,5 +13,5 @@ embedding:
mode: sagemaker mode: sagemaker
sagemaker: sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140 llm_endpoint_name: llm
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479 embedding_endpoint_name: embedding

View File

@ -14,5 +14,8 @@ qdrant:
llm: llm:
mode: mock mode: mock
embedding:
mode: mock
ui: ui:
enabled: false enabled: false

View File

@ -1,11 +1,14 @@
server:
env_name: ${APP_ENV:vllm}
llm: llm:
mode: openailike mode: openailike
embedding: embedding:
mode: local mode: huggingface
ingest_mode: simple ingest_mode: simple
local: huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5 embedding_hf_model_name: BAAI/bge-small-en-v1.5
openai: openai:

View File

@ -35,17 +35,24 @@ ui:
delete_all_files_button_enabled: true delete_all_files_button_enabled: true
llm: llm:
mode: local mode: llamacpp
# Should be matching the selected model # Should be matching the selected model
max_new_tokens: 512 max_new_tokens: 512
context_window: 3900 context_window: 3900
tokenizer: mistralai/Mistral-7B-Instruct-v0.2
llamacpp:
prompt_style: "mistral"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding: embedding:
# Should be matching the value above in most cases # Should be matching the value above in most cases
mode: local mode: huggingface
ingest_mode: simple ingest_mode: simple
huggingface:
embedding_hf_model_name: BAAI/bge-small-en-v1.5
vectorstore: vectorstore:
database: qdrant database: qdrant
@ -62,12 +69,6 @@ pgvector:
schema_name: private_gpt schema_name: private_gpt
table_name: embeddings table_name: embeddings
local:
prompt_style: "mistral"
llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
embedding_hf_model_name: BAAI/bge-small-en-v1.5
sagemaker: sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140 llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479 embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479