Update README.md

fix: 503 when private gpt gets ollama service (#2104 )
When running private gpt with external ollama API, ollama service returns 503 on startup because ollama service (traefik) might not be ready. - Add healthcheck to ollama service to test for connection to external ollama - private-gpt-ollama service depends on ollama being service_healthy Co-authored-by: Koh Meng Hui <kohmh@duck.com>
2025-05-15 11:49:27 +00:00 · 2024-11-13 20:29:56 +01:00 · 2024-10-17 12:44:28 +02:00 · 2024-09-26 16:29:52 +02:00 · 2024-09-25 12:00:03 +02:00 · 2024-09-24 08:33:02 +02:00
33 changed files with 3035 additions and 2464 deletions
--- a/.github/release_please/.release-please-config.json
+++ b/.github/release_please/.release-please-config.json
@ -0,0 +1,19 @@
+{
+    "$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json",
+    "release-type": "simple",
+    "version-file": "version.txt",
+    "extra-files": [
+      {
+        "type": "toml",
+        "path": "pyproject.toml",
+        "jsonpath": "$.tool.poetry.version"
+      },
+      {
+        "type": "generic",
+        "path": "docker-compose.yaml"
+      }
+    ],
+    "packages": {
+      ".": {}
+    }
+  }
--- a/.github/release_please/.release-please-manifest.json
+++ b/.github/release_please/.release-please-manifest.json
@ -0,0 +1,3 @@
+{
+  ".": "0.6.2"
+}
--- a/.github/workflows/generate-release.yml
+++ b/.github/workflows/generate-release.yml
@ -7,7 +7,7 @@ on:

 env:
  REGISTRY: docker.io
-  IMAGE_NAME: ${{ github.repository }}
+  IMAGE_NAME: zylonai/private-gpt
  platforms: linux/amd64,linux/arm64
  DEFAULT_TYPE: "ollama"

--- a/.github/workflows/release-please.yml
+++ b/.github/workflows/release-please.yml
@ -13,7 +13,8 @@ jobs:
  release-please:
    runs-on: ubuntu-latest
    steps:
-      - uses: google-github-actions/release-please-action@v3
+      - uses: google-github-actions/release-please-action@v4
+        id: release
        with:
-          release-type: simple
-          version-file: version.txt
+          config-file: .github/release_please/.release-please-config.json
+          manifest-file: .github/release_please/.release-please-manifest.json
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@ -14,7 +14,7 @@ jobs:
  setup:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
      - uses: ./.github/workflows/actions/install_dependencies

  checks:
@ -28,7 +28,7 @@ jobs:
          - ruff
          - mypy
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
      - uses: ./.github/workflows/actions/install_dependencies
      - name: run ${{ matrix.quality-command }}
        run: make ${{ matrix.quality-command }}
@ -38,7 +38,7 @@ jobs:
    runs-on: ubuntu-latest
    name: test
    steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
      - uses: ./.github/workflows/actions/install_dependencies
      - name: run test
        run: make test-coverage
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,5 +1,15 @@
 # Changelog

+## [0.6.2](https://github.com/zylon-ai/private-gpt/compare/v0.6.1...v0.6.2) (2024-08-08)
+
+
+### Bug Fixes
+
+* add numpy issue to troubleshooting ([#2048](https://github.com/zylon-ai/private-gpt/issues/2048)) ([4ca6d0c](https://github.com/zylon-ai/private-gpt/commit/4ca6d0cb556be7a598f7d3e3b00d2a29214ee1e8))
+* auto-update version ([#2052](https://github.com/zylon-ai/private-gpt/issues/2052)) ([7fefe40](https://github.com/zylon-ai/private-gpt/commit/7fefe408b4267684c6e3c1a43c5dc2b73ec61fe4))
+* publish image name ([#2043](https://github.com/zylon-ai/private-gpt/issues/2043)) ([b1acf9d](https://github.com/zylon-ai/private-gpt/commit/b1acf9dc2cbca2047cd0087f13254ff5cda6e570))
+* update matplotlib to 3.9.1-post1 to fix win install ([b16abbe](https://github.com/zylon-ai/private-gpt/commit/b16abbefe49527ac038d235659854b98345d5387))
+
 ## [0.6.1](https://github.com/zylon-ai/private-gpt/compare/v0.6.0...v0.6.1) (2024-08-05)


--- a/Dockerfile.llamacpp-cpu
+++ b/Dockerfile.llamacpp-cpu
@ -1,6 +1,6 @@
 ### IMPORTANT, THIS IMAGE CAN ONLY BE RUN IN LINUX DOCKER
 ### You will run into a segfault in mac
-FROM python:3.11.6-slim-bookworm as base
+FROM python:3.11.6-slim-bookworm AS base

 # Install poetry
 RUN pip install pipx
@ -20,14 +20,14 @@ RUN apt update && apt install -y \
 # https://python-poetry.org/docs/configuration/#virtualenvsin-project
 ENV POETRY_VIRTUALENVS_IN_PROJECT=true

-FROM base as dependencies
+FROM base AS dependencies
 WORKDIR /home/worker/app
 COPY pyproject.toml poetry.lock ./

 ARG POETRY_EXTRAS="ui embeddings-huggingface llms-llama-cpp vector-stores-qdrant"
 RUN poetry install --no-root --extras "${POETRY_EXTRAS}"

-FROM base as app
+FROM base AS app

 ENV PYTHONUNBUFFERED=1
 ENV PORT=8080
--- a/Dockerfile.ollama
+++ b/Dockerfile.ollama
@ -1,4 +1,4 @@
-FROM python:3.11.6-slim-bookworm as base
+FROM python:3.11.6-slim-bookworm AS base

 # Install poetry
 RUN pip install pipx
@ -10,14 +10,14 @@ ENV PATH=".venv/bin/:$PATH"
 # https://python-poetry.org/docs/configuration/#virtualenvsin-project
 ENV POETRY_VIRTUALENVS_IN_PROJECT=true

-FROM base as dependencies
+FROM base AS dependencies
 WORKDIR /home/worker/app
 COPY pyproject.toml poetry.lock ./

 ARG POETRY_EXTRAS="ui vector-stores-qdrant llms-ollama embeddings-ollama"
 RUN poetry install --no-root --extras "${POETRY_EXTRAS}"

-FROM base as app
+FROM base AS app
 ENV PYTHONUNBUFFERED=1
 ENV PORT=8080
 ENV APP_ENV=prod
--- a/README.md
+++ b/README.md
@ -1,4 +1,6 @@
-# 🔒 PrivateGPT 📑
+# PrivateGPT 
+
+<a href="https://trendshift.io/repositories/2601" target="_blank"><img src="https://trendshift.io/api/badge/repositories/2601" alt="imartinez%2FprivateGPT | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>

 [![Tests](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml/badge.svg)](https://github.com/zylon-ai/private-gpt/actions/workflows/tests.yml?query=branch%3Amain)
 [![Website](https://img.shields.io/website?up_message=check%20it&down_message=down&url=https%3A%2F%2Fdocs.privategpt.dev%2F&label=Documentation)](https://docs.privategpt.dev/)
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@ -7,12 +7,13 @@ services:
  # Private-GPT service for the Ollama CPU and GPU modes
  # This service builds from an external Dockerfile and runs the Ollama mode.
  private-gpt-ollama:
-    image: ${PGPT_IMAGE:-zylonai/private-gpt}${PGPT_TAG:-0.6.1}-ollama
+    image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-ollama  # x-release-please-version
+    user: root
    build:
      context: .
      dockerfile: Dockerfile.ollama
    volumes:
-      - ./local_data/:/home/worker/app/local_data
+      - ./local_data:/home/worker/app/local_data
    ports:
      - "8001:8001"
    environment:
@ -27,11 +28,15 @@ services:
      - ollama-cpu
      - ollama-cuda
      - ollama-api
+    depends_on:
+      ollama:
+        condition: service_healthy

  # Private-GPT service for the local mode
  # This service builds from a local Dockerfile and runs the application in local mode.
  private-gpt-llamacpp-cpu:
-    image: ${PGPT_IMAGE:-zylonai/private-gpt}${PGPT_TAG:-0.6.1}-llamacpp-cpu
+    image: ${PGPT_IMAGE:-zylonai/private-gpt}:${PGPT_TAG:-0.6.2}-llamacpp-cpu # x-release-please-version
+    user: root
    build:
      context: .
      dockerfile: Dockerfile.llamacpp-cpu
@ -44,7 +49,7 @@ services:
    environment:
      PORT: 8001
      PGPT_PROFILES: local
-      HF_TOKEN: ${HF_TOKEN}
+      HF_TOKEN: ${HF_TOKEN:-}
    profiles:
      - llamacpp-cpu

@ -56,9 +61,14 @@ services:
  # This will route requests to the Ollama service based on the profile.
  ollama:
    image: traefik:v2.10
+    healthcheck:
+      test: ["CMD", "sh", "-c", "wget -q --spider http://ollama:11434 || exit 1"]
+      interval: 10s
+      retries: 3
+      start_period: 5s
+      timeout: 5s
    ports:
-      - "11435:11434"
-      - "8081:8080"
+      - "8080:8080"
    command:
      - "--providers.file.filename=/etc/router.yml"
      - "--log.level=ERROR"
@ -80,15 +90,19 @@ services:
  # Ollama service for the CPU mode
  ollama-cpu:
    image: ollama/ollama:latest
+    ports:
+      - "11434:11434"
    volumes:
      - ./models:/root/.ollama
    profiles:
      - ""
-      - ollama
+      - ollama-cpu

  # Ollama service for the CUDA mode
  ollama-cuda:
    image: ollama/ollama:latest
+    ports:
+      - "11434:11434"
    volumes:
      - ./models:/root/.ollama
    deploy:
@ -99,4 +113,4 @@ services:
              count: 1
              capabilities: [gpu]
    profiles:
-      - ollama-cuda
+      - ollama-cuda
--- a/fern/docs/pages/installation/installation.mdx
+++ b/fern/docs/pages/installation/installation.mdx
@ -307,11 +307,12 @@ If you have all required dependencies properly configured running the
 following powershell command should succeed.

 ```powershell
-$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
+$env:CMAKE_ARGS='-DLLAMA_CUBLAS=on'; poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0
 ```

 If your installation was correct, you should see a message similar to the following next
-time you start the server `BLAS = 1`.
+time you start the server `BLAS = 1`. If there is some issue, please refer to the
+[troubleshooting](/installation/getting-started/troubleshooting#building-llama-cpp-with-nvidia-gpu-support) section.

 ```console
 llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
@ -339,11 +340,12 @@ Some tips:
 After that running the following command in the repository will install llama.cpp with GPU support:

 ```bash
-CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
+CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python numpy==1.26.0
 ```

 If your installation was correct, you should see a message similar to the following next
-time you start the server `BLAS = 1`.
+time you start the server `BLAS = 1`. If there is some issue, please refer to the
+[troubleshooting](/installation/getting-started/troubleshooting#building-llama-cpp-with-nvidia-gpu-support) section.

 ```
 llama_new_context_with_model: total VRAM used: 4857.93 MB (model: 4095.05 MB, context: 762.87 MB)
--- a/fern/docs/pages/installation/troubleshooting.mdx
+++ b/fern/docs/pages/installation/troubleshooting.mdx
@ -46,4 +46,19 @@ huggingface:
 embedding:
  embed_dim: 384
 ```
-</Callout>
+</Callout>
+
+# Building Llama-cpp with NVIDIA GPU support
+
+## Out-of-memory error
+
+If you encounter an out-of-memory error while running `llama-cpp` with CUDA, you can try the following steps to resolve the issue:
+1. **Set the next environment:**
+    ```bash
+    TOKENIZERS_PARALLELISM=true
+    ```
+2. **Run PrivateGPT:**
+    ```bash
+    poetry run python -m privategpt
+    ```
+Give thanks to [MarioRossiGithub](https://github.com/MarioRossiGithub) for providing the following solution.
--- a/poetry.lock
+++ b/poetry.lock
--- a/private_gpt/components/embedding/embedding_component.py
+++ b/private_gpt/components/embedding/embedding_component.py
@ -144,6 +144,23 @@ class EmbeddingComponent:
                    api_key=settings.gemini.api_key,
                    model_name=settings.gemini.embedding_model,
                )
+            case "mistralai":
+                try:
+                    from llama_index.embeddings.mistralai import (  # type: ignore
+                        MistralAIEmbedding,
+                    )
+                except ImportError as e:
+                    raise ImportError(
+                        "Mistral dependencies not found, install with `poetry install --extras embeddings-mistral`"
+                    ) from e
+
+                api_key = settings.openai.api_key
+                model = settings.openai.embedding_model
+
+                self.embedding_model = MistralAIEmbedding(
+                    api_key=api_key,
+                    model=model,
+                )
            case "mock":
                # Not a random number, is the dimensionality used by
                # the default embedding model
--- a/private_gpt/components/ingest/ingest_component.py
+++ b/private_gpt/components/ingest/ingest_component.py
@ -403,7 +403,7 @@ class PipelineIngestComponent(BaseIngestComponentWithIndex):
                self.transformations,
                show_progress=self.show_progress,
            )
-            self.node_q.put(("process", file_name, documents, nodes))
+            self.node_q.put(("process", file_name, documents, list(nodes)))
        finally:
            self.doc_semaphore.release()
            self.doc_q.task_done()  # unblock Q joins
--- a/private_gpt/components/ingest/ingest_helper.py
+++ b/private_gpt/components/ingest/ingest_helper.py
@ -92,7 +92,13 @@ class IngestionHelper:
            return string_reader.load_data([file_data.read_text()])

        logger.debug("Specific reader found for extension=%s", extension)
-        return reader_cls().load_data(file_data)
+        documents = reader_cls().load_data(file_data)
+
+        # Sanitize NUL bytes in text which can't be stored in Postgres
+        for i in range(len(documents)):
+            documents[i].text = documents[i].text.replace("\u0000", "")
+
+        return documents

    @staticmethod
    def _exclude_metadata(documents: list[Document]) -> None:
--- a/private_gpt/components/llm/llm_component.py
+++ b/private_gpt/components/llm/llm_component.py
@ -120,7 +120,6 @@ class LLMComponent:
                    api_version="",
                    temperature=settings.llm.temperature,
                    context_window=settings.llm.context_window,
-                    max_new_tokens=settings.llm.max_new_tokens,
                    messages_to_prompt=prompt_style.messages_to_prompt,
                    completion_to_prompt=prompt_style.completion_to_prompt,
                    tokenizer=settings.llm.tokenizer,
@ -184,10 +183,10 @@ class LLMComponent:

                        return wrapper

-                    Ollama.chat = add_keep_alive(Ollama.chat)
-                    Ollama.stream_chat = add_keep_alive(Ollama.stream_chat)
-                    Ollama.complete = add_keep_alive(Ollama.complete)
-                    Ollama.stream_complete = add_keep_alive(Ollama.stream_complete)
+                    Ollama.chat = add_keep_alive(Ollama.chat)  # type: ignore
+                    Ollama.stream_chat = add_keep_alive(Ollama.stream_chat)  # type: ignore
+                    Ollama.complete = add_keep_alive(Ollama.complete)  # type: ignore
+                    Ollama.stream_complete = add_keep_alive(Ollama.stream_complete)  # type: ignore

                self.llm = llm

--- a/private_gpt/components/llm/prompt_helper.py
+++ b/private_gpt/components/llm/prompt_helper.py
@ -40,7 +40,8 @@ class AbstractPromptStyle(abc.ABC):
        logger.debug("Got for messages='%s' the prompt='%s'", messages, prompt)
        return prompt

-    def completion_to_prompt(self, completion: str) -> str:
+    def completion_to_prompt(self, prompt: str) -> str:
+        completion = prompt  # Fix: Llama-index parameter has to be named as prompt
        prompt = self._completion_to_prompt(completion)
        logger.debug("Got for completion='%s' the prompt='%s'", completion, prompt)
        return prompt
@ -285,8 +286,9 @@ class ChatMLPromptStyle(AbstractPromptStyle):


 def get_prompt_style(
-    prompt_style: Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"]
-    | None
+    prompt_style: (
+        Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"] | None
+    )
 ) -> AbstractPromptStyle:
    """Get the prompt style to use from the given string.

--- a/private_gpt/components/node_store/node_store_component.py
+++ b/private_gpt/components/node_store/node_store_component.py
@ -38,10 +38,10 @@ class NodeStoreComponent:

            case "postgres":
                try:
-                    from llama_index.core.storage.docstore.postgres_docstore import (
+                    from llama_index.storage.docstore.postgres import (  # type: ignore
                        PostgresDocumentStore,
                    )
-                    from llama_index.core.storage.index_store.postgres_index_store import (
+                    from llama_index.storage.index_store.postgres import (  # type: ignore
                        PostgresIndexStore,
                    )
                except ImportError:
@ -55,6 +55,7 @@ class NodeStoreComponent:
                self.index_store = PostgresIndexStore.from_params(
                    **settings.postgres.model_dump(exclude_none=True)
                )
+
                self.doc_store = PostgresDocumentStore.from_params(
                    **settings.postgres.model_dump(exclude_none=True)
                )
--- a/private_gpt/components/vector_store/batched_chroma.py
+++ b/private_gpt/components/vector_store/batched_chroma.py
@ -1,14 +1,17 @@
-from collections.abc import Generator
-from typing import Any
+from collections.abc import Generator, Sequence
+from typing import TYPE_CHECKING, Any

 from llama_index.core.schema import BaseNode, MetadataMode
 from llama_index.core.vector_stores.utils import node_to_metadata_dict
 from llama_index.vector_stores.chroma import ChromaVectorStore  # type: ignore

+if TYPE_CHECKING:
+    from collections.abc import Mapping
+

 def chunk_list(
-    lst: list[BaseNode], max_chunk_size: int
-) -> Generator[list[BaseNode], None, None]:
+    lst: Sequence[BaseNode], max_chunk_size: int
+) -> Generator[Sequence[BaseNode], None, None]:
    """Yield successive max_chunk_size-sized chunks from lst.

    Args:
@ -60,7 +63,7 @@ class BatchedChromaVectorStore(ChromaVectorStore):  # type: ignore
        )
        self.chroma_client = chroma_client

-    def add(self, nodes: list[BaseNode], **add_kwargs: Any) -> list[str]:
+    def add(self, nodes: Sequence[BaseNode], **add_kwargs: Any) -> list[str]:
        """Add nodes to index, batching the insertion to avoid issues.

        Args:
@ -78,8 +81,8 @@ class BatchedChromaVectorStore(ChromaVectorStore):  # type: ignore

        all_ids = []
        for node_chunk in node_chunks:
-            embeddings = []
-            metadatas = []
+            embeddings: list[Sequence[float]] = []
+            metadatas: list[Mapping[str, Any]] = []
            ids = []
            documents = []
            for node in node_chunk:
--- a/private_gpt/server/chat/chat_service.py
+++ b/private_gpt/server/chat/chat_service.py
@ -1,4 +1,5 @@
 from dataclasses import dataclass
+from typing import TYPE_CHECKING

 from injector import inject, singleton
 from llama_index.core.chat_engine import ContextChatEngine, SimpleChatEngine
@ -26,6 +27,9 @@ from private_gpt.open_ai.extensions.context_filter import ContextFilter
 from private_gpt.server.chunks.chunks_service import Chunk
 from private_gpt.settings.settings import Settings

+if TYPE_CHECKING:
+    from llama_index.core.postprocessor.types import BaseNodePostprocessor
+

 class Completion(BaseModel):
    response: str
@ -114,12 +118,15 @@ class ChatService:
                context_filter=context_filter,
                similarity_top_k=self.settings.rag.similarity_top_k,
            )
-            node_postprocessors = [
+            node_postprocessors: list[BaseNodePostprocessor] = [
                MetadataReplacementPostProcessor(target_metadata_key="window"),
-                SimilarityPostprocessor(
-                    similarity_cutoff=settings.rag.similarity_value
-                ),
            ]
+            if settings.rag.similarity_value:
+                node_postprocessors.append(
+                    SimilarityPostprocessor(
+                        similarity_cutoff=settings.rag.similarity_value
+                    )
+                )

            if settings.rag.rerank.enabled:
                rerank_postprocessor = SentenceTransformerRerank(
--- a/private_gpt/server/recipes/summarize/summarize_service.py
+++ b/private_gpt/server/recipes/summarize/summarize_service.py
@ -90,9 +90,9 @@ class SummarizeService:
        # Add context documents to summarize
        if use_context:
            # 1. Recover all ref docs
-            ref_docs: dict[
-                str, RefDocInfo
-            ] | None = self.storage_context.docstore.get_all_ref_doc_info()
+            ref_docs: dict[str, RefDocInfo] | None = (
+                self.storage_context.docstore.get_all_ref_doc_info()
+            )
            if ref_docs is None:
                raise ValueError("No documents have been ingested yet.")

--- a/private_gpt/settings/settings.py
+++ b/private_gpt/settings/settings.py
@ -136,19 +136,19 @@ class LLMSettings(BaseModel):
        0.1,
        description="The temperature of the model. Increasing the temperature will make the model answer more creatively. A value of 0.1 would be more factual.",
    )
-    prompt_style: Literal[
-        "default", "llama2", "llama3", "tag", "mistral", "chatml"
-    ] = Field(
-        "llama2",
-        description=(
-            "The prompt style to use for the chat engine. "
-            "If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
-            "If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
-            "If `llama3` - use the llama3 prompt style from the llama_index."
-            "If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
-            "If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
-            "`llama2` is the historic behaviour. `default` might work better with your custom models."
-        ),
+    prompt_style: Literal["default", "llama2", "llama3", "tag", "mistral", "chatml"] = (
+        Field(
+            "llama2",
+            description=(
+                "The prompt style to use for the chat engine. "
+                "If `default` - use the default prompt style from the llama_index. It should look like `role: message`.\n"
+                "If `llama2` - use the llama2 prompt style from the llama_index. Based on `<s>`, `[INST]` and `<<SYS>>`.\n"
+                "If `llama3` - use the llama3 prompt style from the llama_index."
+                "If `tag` - use the `tag` prompt style. It should look like `<|role|>: message`. \n"
+                "If `mistral` - use the `mistral prompt style. It shoudl look like <s>[INST] {System Prompt} [/INST]</s>[INST] { UserInstructions } [/INST]"
+                "`llama2` is the historic behaviour. `default` might work better with your custom models."
+            ),
+        )
    )


@ -197,7 +197,14 @@ class HuggingFaceSettings(BaseModel):

 class EmbeddingSettings(BaseModel):
    mode: Literal[
-        "huggingface", "openai", "azopenai", "sagemaker", "ollama", "mock", "gemini"
+        "huggingface",
+        "openai",
+        "azopenai",
+        "sagemaker",
+        "ollama",
+        "mock",
+        "gemini",
+        "mistralai",
    ]
    ingest_mode: Literal["simple", "batch", "parallel", "pipeline"] = Field(
        "simple",
@ -350,6 +357,10 @@ class AzureOpenAISettings(BaseModel):
 class UISettings(BaseModel):
    enabled: bool
    path: str
+    default_mode: Literal["RAG", "Search", "Basic", "Summarize"] = Field(
+        "RAG",
+        description="The default mode.",
+    )
    default_chat_system_prompt: str = Field(
        None,
        description="The default system prompt to use for the chat mode.",
--- a/private_gpt/ui/ui.py
+++ b/private_gpt/ui/ui.py
@ -1,4 +1,5 @@
 """This file should be imported if and only if you want to run the UI locally."""
+
 import base64
 import logging
 import time
@ -99,8 +100,11 @@ class PrivateGptUi:
        self._selected_filename = None

        # Initialize system prompt based on default mode
-        self.mode = MODES[0]
-        self._system_prompt = self._get_default_system_prompt(self.mode)
+        default_mode_map = {mode.value: mode for mode in Modes}
+        self._default_mode = default_mode_map.get(
+            settings().ui.default_mode, Modes.RAG_MODE
+        )
+        self._system_prompt = self._get_default_system_prompt(self._default_mode)

    def _chat(
        self, message: str, history: list[list[str]], mode: Modes, *_: Any
@ -390,7 +394,7 @@ class PrivateGptUi:

            with gr.Row(equal_height=False):
                with gr.Column(scale=3):
-                    default_mode = MODES[0]
+                    default_mode = self._default_mode
                    mode = gr.Radio(
                        [mode.value for mode in MODES],
                        label="Mode",
--- a/private_gpt/utils/ollama.py
+++ b/private_gpt/utils/ollama.py
@ -3,10 +3,13 @@ from collections import deque
 from collections.abc import Iterator, Mapping
 from typing import Any

+from httpx import ConnectError
 from tqdm import tqdm  # type: ignore

+from private_gpt.utils.retry import retry
+
 try:
-    from ollama import Client  # type: ignore
+    from ollama import Client, ResponseError  # type: ignore
 except ImportError as e:
    raise ImportError(
        "Ollama dependencies not found, install with `poetry install --extras llms-ollama or embeddings-ollama`"
@ -14,13 +17,25 @@ except ImportError as e:

 logger = logging.getLogger(__name__)

+_MAX_RETRIES = 5
+_JITTER = (3.0, 10.0)

+
+@retry(
+    is_async=False,
+    exceptions=(ConnectError, ResponseError),
+    tries=_MAX_RETRIES,
+    jitter=_JITTER,
+    logger=logger,
+)
 def check_connection(client: Client) -> bool:
    try:
        client.list()
        return True
+    except (ConnectError, ResponseError) as e:
+        raise e
    except Exception as e:
-        logger.error(f"Failed to connect to Ollama: {e!s}")
+        logger.error(f"Failed to connect to Ollama: {type(e).__name__}: {e!s}")
        return False


--- a/private_gpt/utils/retry.py
+++ b/private_gpt/utils/retry.py
@ -0,0 +1,31 @@
+import logging
+from collections.abc import Callable
+from typing import Any
+
+from retry_async import retry as retry_untyped  # type: ignore
+
+retry_logger = logging.getLogger(__name__)
+
+
+def retry(
+    exceptions: Any = Exception,
+    *,
+    is_async: bool = False,
+    tries: int = -1,
+    delay: float = 0,
+    max_delay: float | None = None,
+    backoff: float = 1,
+    jitter: float | tuple[float, float] = 0,
+    logger: logging.Logger = retry_logger,
+) -> Callable[..., Any]:
+    wrapped = retry_untyped(
+        exceptions=exceptions,
+        is_async=is_async,
+        tries=tries,
+        delay=delay,
+        max_delay=max_delay,
+        backoff=backoff,
+        jitter=jitter,
+        logger=logger,
+    )
+    return wrapped  # type: ignore
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,88 +1,81 @@
 [tool.poetry]
 name = "private-gpt"
-version = "0.6.0"
+version = "0.6.2"
 description = "Private GPT"
 authors = ["Zylon <hi@zylon.ai>"]

 [tool.poetry.dependencies]
 python = ">=3.11,<3.12"
 # PrivateGPT
-fastapi = { extras = ["all"], version = "^0.111.0" }
-python-multipart = "^0.0.9"
-injector = "^0.21.0"
-pyyaml = "^6.0.1"
+fastapi = { extras = ["all"], version = "^0.115.0" }
+python-multipart = "^0.0.10"
+injector = "^0.22.0"
+pyyaml = "^6.0.2"
 watchdog = "^4.0.1"
-transformers = "^4.42.3"
+transformers = "^4.44.2"
 docx2txt = "^0.8"
 cryptography = "^3.1"
 # LlamaIndex core libs
-llama-index-core = "^0.10.52"
-llama-index-readers-file = "^0.1.27"
+llama-index-core = ">=0.11.2,<0.12.0"
+llama-index-readers-file = "*"
 # Optional LlamaIndex integration libs
-llama-index-llms-llama-cpp = {version = "^0.1.4", optional = true}
-llama-index-llms-openai = {version = "^0.1.25", optional = true}
-llama-index-llms-openai-like = {version ="^0.1.3", optional = true}
-llama-index-llms-ollama = {version ="^0.2.2", optional = true}
-llama-index-llms-azure-openai = {version ="^0.1.8", optional = true}
-llama-index-llms-gemini = {version ="^0.1.11", optional = true}
-llama-index-embeddings-ollama = {version ="^0.1.2", optional = true}
-llama-index-embeddings-huggingface = {version ="^0.2.2", optional = true}
-llama-index-embeddings-openai = {version ="^0.1.10", optional = true}
-llama-index-embeddings-azure-openai = {version ="^0.1.10", optional = true}
-llama-index-embeddings-gemini = {version ="^0.1.8", optional = true}
-llama-index-vector-stores-qdrant = {version ="^0.2.10", optional = true}
-llama-index-vector-stores-milvus = {version ="^0.1.20", optional = true}
-llama-index-vector-stores-chroma = {version ="^0.1.10", optional = true}
-llama-index-vector-stores-postgres = {version ="^0.1.11", optional = true}
-llama-index-vector-stores-clickhouse = {version ="^0.1.3", optional = true}
-llama-index-storage-docstore-postgres = {version ="^0.1.3", optional = true}
-llama-index-storage-index-store-postgres = {version ="^0.1.4", optional = true}
+llama-index-llms-llama-cpp = {version = "*", optional = true}
+llama-index-llms-openai = {version ="*", optional = true}
+llama-index-llms-openai-like = {version ="*", optional = true}
+llama-index-llms-ollama = {version ="*", optional = true}
+llama-index-llms-azure-openai = {version ="*", optional = true}
+llama-index-llms-gemini = {version ="*", optional = true}
+llama-index-embeddings-ollama = {version ="*", optional = true}
+llama-index-embeddings-huggingface = {version ="*", optional = true}
+llama-index-embeddings-openai = {version ="*", optional = true}
+llama-index-embeddings-azure-openai = {version ="*", optional = true}
+llama-index-embeddings-gemini = {version ="*", optional = true}
+llama-index-embeddings-mistralai = {version ="*", optional = true}
+llama-index-vector-stores-qdrant = {version ="*", optional = true}
+llama-index-vector-stores-milvus = {version ="*", optional = true}
+llama-index-vector-stores-chroma = {version ="*", optional = true}
+llama-index-vector-stores-postgres = {version ="*", optional = true}
+llama-index-vector-stores-clickhouse = {version ="*", optional = true}
+llama-index-storage-docstore-postgres = {version ="*", optional = true}
+llama-index-storage-index-store-postgres = {version ="*", optional = true}
 # Postgres
 psycopg2-binary = {version ="^2.9.9", optional = true}
 asyncpg = {version="^0.29.0", optional = true}

 # ClickHouse
-clickhouse-connect = {version = "^0.7.15", optional = true}
+clickhouse-connect = {version = "^0.7.19", optional = true}

 # Optional Sagemaker dependency
-boto3 = {version ="^1.34.139", optional = true}
-
-# Optional Qdrant client
-qdrant-client = {version ="^1.9.0", optional = true}
+boto3 = {version ="^1.35.26", optional = true}

 # Optional Reranker dependencies
-torch = {version ="^2.3.1", optional = true}
-sentence-transformers = {version ="^3.0.1", optional = true}
+torch = {version ="^2.4.1", optional = true}
+sentence-transformers = {version ="^3.1.1", optional = true}

 # Optional UI
-gradio = {version ="^4.37.2", optional = true}
-# Fix: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/16289#issuecomment-2255106490
-ffmpy = {git = "https://github.com/EuDs63/ffmpy.git", rev = "333a19ee4d21f32537c0508aa1942ef1aa7afe24", optional = true}
-
-# Optional Google Gemini dependency
-google-generativeai = {version ="^0.5.4", optional = true}
-
-# Optional Ollama client
-ollama = {version ="^0.3.0", optional = true}
+gradio = {version ="^4.44.0", optional = true}
+ffmpy = {version ="^0.4.0", optional = true}

 # Optional HF Transformers
 einops = {version = "^0.8.0", optional = true}
+retry-async = "^0.1.4"

 [tool.poetry.extras]
 ui = ["gradio", "ffmpy"]
 llms-llama-cpp = ["llama-index-llms-llama-cpp"]
 llms-openai = ["llama-index-llms-openai"]
 llms-openai-like = ["llama-index-llms-openai-like"]
-llms-ollama = ["llama-index-llms-ollama", "ollama"]
+llms-ollama = ["llama-index-llms-ollama"]
 llms-sagemaker = ["boto3"]
 llms-azopenai = ["llama-index-llms-azure-openai"]
-llms-gemini = ["llama-index-llms-gemini", "google-generativeai"]
-embeddings-ollama = ["llama-index-embeddings-ollama", "ollama"]
+llms-gemini = ["llama-index-llms-gemini"]
+embeddings-ollama = ["llama-index-embeddings-ollama"]
 embeddings-huggingface = ["llama-index-embeddings-huggingface", "einops"]
 embeddings-openai = ["llama-index-embeddings-openai"]
 embeddings-sagemaker = ["boto3"]
 embeddings-azopenai = ["llama-index-embeddings-azure-openai"]
 embeddings-gemini = ["llama-index-embeddings-gemini"]
+embeddings-mistral = ["llama-index-embeddings-mistralai"]
 vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
 vector-stores-clickhouse = ["llama-index-vector-stores-clickhouse", "clickhouse_connect"]
 vector-stores-chroma = ["llama-index-vector-stores-chroma"]
@ -92,14 +85,14 @@ storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-ind
 rerank-sentence-transformers = ["torch", "sentence-transformers"]

 [tool.poetry.group.dev.dependencies]
-black = "^22"
-mypy = "^1.2"
-pre-commit = "^2"
-pytest = "^7"
-pytest-cov = "^3"
+black = "^24"
+mypy = "^1.11"
+pre-commit = "^3"
+pytest = "^8"
+pytest-cov = "^5"
 ruff = "^0"
-pytest-asyncio = "^0.21.1"
-types-pyyaml = "^6.0.12.12"
+pytest-asyncio = "^0.24.0"
+types-pyyaml = "^6.0.12.20240917"

 [build-system]
 requires = ["poetry-core>=1.0.0"]
--- a/settings.yaml
+++ b/settings.yaml
@ -25,21 +25,23 @@ data:
 ui:
  enabled: true
  path: /
+  # "RAG", "Search", "Basic", or "Summarize"
+  default_mode: "RAG"
  default_chat_system_prompt: >
-    You are a helpful, respectful and honest assistant. 
+    You are a helpful, respectful and honest assistant.
    Always answer as helpfully as possible and follow ALL given instructions.
    Do not speculate or make up information.
    Do not reference any given instructions or context.
  default_query_system_prompt: >
-    You can only answer questions about the provided context. 
-    If you know the answer but it is not based in the provided context, don't provide 
+    You can only answer questions about the provided context.
+    If you know the answer but it is not based in the provided context, don't provide
    the answer, just state the answer is not in the context provided.
  default_summarization_system_prompt: >
-    Provide a comprehensive summary of the provided context information. 
+    Provide a comprehensive summary of the provided context information.
    The summary should cover all the key points and main ideas presented in
-    the original text, while also condensing the information into a concise 
+    the original text, while also condensing the information into a concise
    and easy-to-understand format. Please ensure that the summary includes
-    relevant details and examples that support the main ideas, while avoiding 
+    relevant details and examples that support the main ideas, while avoiding
    any unnecessary information or repetition.
  delete_file_button_enabled: true
  delete_all_files_button_enabled: true
--- a/tests/fixtures/fast_api_test_client.py
+++ b/tests/fixtures/fast_api_test_client.py
@ -5,7 +5,7 @@ from private_gpt.launcher import create_app
 from tests.fixtures.mock_injector import MockInjector


-@pytest.fixture()
+@pytest.fixture
 def test_client(request: pytest.FixtureRequest, injector: MockInjector) -> TestClient:
    if request is not None and hasattr(request, "param"):
        injector.bind_settings(request.param or {})
--- a/tests/fixtures/ingest_helper.py
+++ b/tests/fixtures/ingest_helper.py
@ -19,6 +19,6 @@ class IngestHelper:
        return ingest_result


-@pytest.fixture()
+@pytest.fixture
 def ingest_helper(test_client: TestClient) -> IngestHelper:
    return IngestHelper(test_client)
--- a/tests/fixtures/mock_injector.py
+++ b/tests/fixtures/mock_injector.py
@ -37,6 +37,6 @@ class MockInjector:
        return self.test_injector.get(interface)


-@pytest.fixture()
+@pytest.fixture
 def injector() -> MockInjector:
    return MockInjector()
--- a/tests/server/ingest/test_local_ingest.py
+++ b/tests/server/ingest/test_local_ingest.py
@ -6,7 +6,7 @@ import pytest
 from fastapi.testclient import TestClient


-@pytest.fixture()
+@pytest.fixture
 def file_path() -> str:
    return "test.txt"

--- a/version.txt
+++ b/version.txt
@ -1 +1 @@
-0.6.1
+0.6.2
Author	SHA1	Message	Date
Iván Martínez	b7ee43788d	Update README.md	2024-11-13 20:29:56 +01:00
meng-hui	940bdd49af	fix: 503 when private gpt gets ollama service (#2104 ) When running private gpt with external ollama API, ollama service returns 503 on startup because ollama service (traefik) might not be ready. - Add healthcheck to ollama service to test for connection to external ollama - private-gpt-ollama service depends on ollama being service_healthy Co-authored-by: Koh Meng Hui <kohmh@duck.com>	2024-10-17 12:44:28 +02:00
Javier Martinez	5851b02378	feat: update llama-index + dependencies (#2092 ) * chore: update libraries * fix: mypy * chore: more updates * fix: mypy/black * chore: fix docker warnings * fix: mypy * fix: black	2024-09-26 16:29:52 +02:00
Dmitri Qiu	5fbb402477	fix: Sanitize null bytes before ingestion (#2090 ) * Sanitize null bytes before ingestion * Added comments	2024-09-25 12:00:03 +02:00
J	fa3c30661d	fix: Add default mode option to settings (#2078 ) * Add default mode option to settings * Revise default_mode to Literal (enum) and add to settings.yaml * Revise to pass make check/test * Default mode: RAG --------- Co-authored-by: Jason <jason@sowinsight.solutions>	2024-09-24 08:33:02 +02:00
Liam Dowd	f9182b3a86	feat: Adding MistralAI mode (#2065 ) * Adding MistralAI mode * Update embedding_component.py * Update ui.py * Update settings.py * Update embedding_component.py * Update settings.py * Update settings.py * Update settings-mistral.yaml * Update llm_component.py * Update settings-mistral.yaml * Update settings.py * Update settings.py * Update ui.py * Update embedding_component.py * Delete settings-mistral.yaml --------- Co-authored-by: SkiingIsFun123 <101684827+SkiingIsFun123@users.noreply.github.com> Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>	2024-09-24 08:31:30 +02:00
Javier Martinez	8c12c6830b	fix: docker permissions (#2059 ) * fix: missing depends_on * chore: update copy permissions * chore: update entrypoint * Revert "chore: update entrypoint" This reverts commit `f73a36af2f`. * Revert "chore: update copy permissions" This reverts commit `fabc3f66bb`. * style: fix docker warning * fix: multiples fixes * fix: user permissions writing local_data folder	2024-09-24 08:30:58 +02:00
Javier Martinez	77461b96cf	feat: add retry connection to ollama (#2084 ) * feat: add retry connection to ollama When Ollama is running in the docker-compose, traefik is not ready sometimes to route the request, and it fails * fix: mypy	2024-09-16 16:43:05 +02:00
Trivikram Kamat	42628596b2	ci: bump actions/checkout to v4 (#2077 )	2024-09-09 08:53:13 +02:00
Artur Martins	7603b3627d	fix: Rectify ffmpy poetry config; update version from 0.3.2 to 0.4.0 (#2062 ) * Fix: Rectify ffmpy 0.3.2 poetry config * keep optional set to false for ffmpy * Updating ffmpy to version 0.4.0 * Remove comment about a fix	2024-08-21 10:39:58 +02:00
Javier Martinez	89477ea9d3	fix: naming image and ollama-cpu (#2056 )	2024-08-12 08:23:16 +02:00
github-actions[bot]	22904ca8ad	chore(main): release 0.6.2 (#2049 ) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-08-08 18:16:41 +02:00
Javier Martinez	7fefe408b4	fix: auto-update version (#2052 )	2024-08-08 16:50:42 +02:00
Javier Martinez	b1acf9dc2c	fix: publish image name (#2043 )	2024-08-07 17:39:32 +02:00
Javier Martinez	4ca6d0cb55	fix: add numpy issue to troubleshooting (#2048 ) * docs: add numpy issue to troubleshooting * fix: troubleshooting link ...	2024-08-07 12:16:03 +02:00
Javier Martinez	b16abbefe4	fix: update matplotlib to 3.9.1-post1 to fix win install * chore: block matplotlib to fix installation in window machines * chore: remove workaround, just update poetry.lock * fix: update matplotlib to last version	2024-08-07 11:26:42 +02:00
 @ -1 +1 @@
 .6.1
 .6.2